2023-11-27 11:49:48,589 INFO [train_asr.py:1303] (3/4) Training started 2023-11-27 11:49:48,589 INFO [train_asr.py:1313] (3/4) Device: cuda:3 2023-11-27 11:49:48,592 INFO [train_asr.py:1325] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0423201334-6587bbc68d-tn554', 'IP address': '10.177.74.211'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 60, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': False, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-27 11:49:48,592 INFO [train_asr.py:1334] (3/4) About to create model 2023-11-27 11:49:49,301 INFO [train_asr.py:1338] (3/4) Number of model parameters: 65819362 2023-11-27 11:49:49,302 INFO [train_asr.py:1362] (3/4) Using CED labels! 2023-11-27 11:49:49,302 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-27 11:49:52,751 INFO [train_asr.py:1370] (3/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-27 11:49:55,276 INFO [train_asr.py:1379] (3/4) Using DDP 2023-11-27 11:49:55,805 INFO [train_asr.py:1402] (3/4) Loading optimizer state dict 2023-11-27 11:49:56,474 INFO [train_asr.py:1410] (3/4) Loading scheduler state dict 2023-11-27 11:49:56,476 INFO [train_asr.py:1432] (3/4) Getting audioset cuts 2023-11-27 11:49:56,476 INFO [kd_datamodule.py:784] (3/4) About to get the audioset cuts. 2023-11-27 11:49:56,478 INFO [train_asr.py:1438] (3/4) Using mux to combine Librispeech with audioset 2023-11-27 11:49:56,478 INFO [train_asr.py:1449] (3/4) CutSet(len=2748469) [underlying data type: ] 2023-11-27 11:50:05,356 INFO [kd_datamodule.py:396] (3/4) Enable MUSAN 2023-11-27 11:50:05,356 INFO [kd_datamodule.py:397] (3/4) About to get Musan cuts 2023-11-27 11:50:08,225 INFO [kd_datamodule.py:427] (3/4) Enable SpecAugment 2023-11-27 11:50:08,225 INFO [kd_datamodule.py:428] (3/4) Time warp factor: 80 2023-11-27 11:50:08,225 INFO [kd_datamodule.py:438] (3/4) Num frame mask: 10 2023-11-27 11:50:08,226 INFO [kd_datamodule.py:451] (3/4) About to create train dataset 2023-11-27 11:50:08,226 INFO [kd_datamodule.py:487] (3/4) Using SimpleCutSampler 2023-11-27 11:50:08,227 INFO [kd_datamodule.py:495] (3/4) About to create train dataloader 2023-11-27 11:50:08,229 INFO [kd_datamodule.py:802] (3/4) About to get the audioset eval cuts. 2023-11-27 11:50:08,230 INFO [train_asr.py:1513] (3/4) CutSet(len=20681) [underlying data type: ] 2023-11-27 11:50:08,283 INFO [kd_datamodule.py:529] (3/4) About to create dev dataset 2023-11-27 11:50:08,719 INFO [kd_datamodule.py:550] (3/4) About to create dev dataloader 2023-11-27 11:50:08,720 INFO [train_asr.py:1527] (3/4) Loading grad scaler state dict 2023-11-27 11:50:28,613 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 0, loss[loss=0.07895, simple_loss=0.1078, pruned_loss=0.01193, audio_tagging_loss=0.0131, over 15471.00 frames. ], tot_loss[loss=0.07895, simple_loss=0.1078, pruned_loss=0.01193, audio_tagging_loss=0.0131, over 15471.00 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:50:28,613 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 11:50:54,978 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3498, 5.0104, 4.6363, 5.1772], device='cuda:3') 2023-11-27 11:51:02,903 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.0578, simple_loss=0.05083, pruned_loss=0.005245, audio_tagging_loss=0.02714, over 4681554.00 frames. 2023-11-27 11:51:02,904 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 11:51:07,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3046020.0, ans=0.125 2023-11-27 11:51:10,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3046020.0, ans=12.0 2023-11-27 11:51:12,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3046020.0, ans=0.0 2023-11-27 11:51:24,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-27 11:51:44,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2023-11-27 11:51:45,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-27 11:51:49,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-27 11:51:55,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-27 11:52:01,314 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 50, loss[loss=0.07668, simple_loss=0.0956, pruned_loss=0.01587, audio_tagging_loss=0.01301, over 14589.00 frames. ], tot_loss[loss=0.07579, simple_loss=0.09169, pruned_loss=0.01325, audio_tagging_loss=0.0167, over 690879.47 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:52:22,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2023-11-27 11:52:25,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.249e+01 9.464e+01 1.034e+02 1.107e+02 1.312e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-27 11:52:30,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046486.6666666665, ans=0.1 2023-11-27 11:52:40,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3046553.3333333335, ans=0.035 2023-11-27 11:52:47,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-27 11:52:54,131 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-27 11:53:00,043 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 100, loss[loss=0.07224, simple_loss=0.09268, pruned_loss=0.01175, audio_tagging_loss=0.01416, over 15334.00 frames. ], tot_loss[loss=0.07448, simple_loss=0.09105, pruned_loss=0.01301, audio_tagging_loss=0.01595, over 1212908.73 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:53:02,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3046686.6666666665, ans=0.125 2023-11-27 11:53:17,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=15.0 2023-11-27 11:53:34,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3046886.6666666665, ans=0.125 2023-11-27 11:53:51,128 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-27 11:53:56,551 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 150, loss[loss=0.06429, simple_loss=0.09863, pruned_loss=0.007496, audio_tagging_loss=0.007478, over 14886.00 frames. ], tot_loss[loss=0.07264, simple_loss=0.09101, pruned_loss=0.01282, audio_tagging_loss=0.01432, over 1625670.22 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:54:02,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3047020.0, ans=0.125 2023-11-27 11:54:13,375 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:54:19,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.988e+01 9.589e+01 1.001e+02 1.163e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 11:54:36,744 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:54:47,782 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-27 11:54:53,334 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 200, loss[loss=0.07088, simple_loss=0.09107, pruned_loss=0.01643, audio_tagging_loss=0.00891, over 14999.00 frames. ], tot_loss[loss=0.07077, simple_loss=0.09037, pruned_loss=0.01281, audio_tagging_loss=0.01278, over 1940682.07 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:55:02,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3047353.3333333335, ans=0.09899494936611666 2023-11-27 11:55:32,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3047553.3333333335, ans=0.1 2023-11-27 11:55:38,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3047620.0, ans=0.0 2023-11-27 11:55:45,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-27 11:55:46,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3047620.0, ans=0.125 2023-11-27 11:55:51,213 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 250, loss[loss=0.05739, simple_loss=0.07973, pruned_loss=0.007938, audio_tagging_loss=0.009583, over 14915.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.09021, pruned_loss=0.01294, audio_tagging_loss=0.01154, over 2184942.80 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:56:06,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3047753.3333333335, ans=0.2 2023-11-27 11:56:14,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.936e+01 9.538e+01 1.043e+02 1.286e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 11:56:15,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=22.5 2023-11-27 11:56:23,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3047886.6666666665, ans=0.0 2023-11-27 11:56:32,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-27 11:56:35,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3047953.3333333335, ans=0.0 2023-11-27 11:56:40,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3047953.3333333335, ans=0.0 2023-11-27 11:56:42,228 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-27 11:56:48,610 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 300, loss[loss=0.06722, simple_loss=0.09119, pruned_loss=0.01176, audio_tagging_loss=0.00987, over 15270.00 frames. ], tot_loss[loss=0.06987, simple_loss=0.0921, pruned_loss=0.01309, audio_tagging_loss=0.01074, over 2373512.99 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:56:57,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3048020.0, ans=0.125 2023-11-27 11:56:58,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-27 11:57:05,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048086.6666666665, ans=0.1 2023-11-27 11:57:13,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3048153.3333333335, ans=0.2 2023-11-27 11:57:18,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3048153.3333333335, ans=0.125 2023-11-27 11:57:19,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048153.3333333335, ans=0.1 2023-11-27 11:57:20,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3048153.3333333335, ans=0.0 2023-11-27 11:57:26,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3048220.0, ans=0.0 2023-11-27 11:57:29,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3048220.0, ans=0.2 2023-11-27 11:57:31,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-27 11:57:39,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-27 11:57:42,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3048286.6666666665, ans=0.125 2023-11-27 11:57:43,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-11-27 11:57:44,934 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 350, loss[loss=0.05416, simple_loss=0.07272, pruned_loss=0.008604, audio_tagging_loss=0.009197, over 15678.00 frames. ], tot_loss[loss=0.06909, simple_loss=0.09179, pruned_loss=0.01309, audio_tagging_loss=0.01011, over 2523648.54 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 8.0 2023-11-27 11:58:07,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2023-11-27 11:58:10,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.638e+01 9.224e+01 9.880e+01 1.297e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 11:58:14,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-27 11:58:27,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3048553.3333333335, ans=10.0 2023-11-27 11:58:36,112 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-27 11:58:42,166 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 400, loss[loss=0.06748, simple_loss=0.1033, pruned_loss=0.008988, audio_tagging_loss=0.006839, over 14729.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09131, pruned_loss=0.01307, audio_tagging_loss=0.009825, over 2637982.32 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:59:04,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3048820.0, ans=0.0 2023-11-27 11:59:06,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=10.0 2023-11-27 11:59:14,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3048886.6666666665, ans=0.2 2023-11-27 11:59:19,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3048886.6666666665, ans=0.125 2023-11-27 11:59:32,734 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-27 11:59:38,806 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 450, loss[loss=0.04566, simple_loss=0.06145, pruned_loss=0.006451, audio_tagging_loss=0.008485, over 14427.00 frames. ], tot_loss[loss=0.06831, simple_loss=0.0917, pruned_loss=0.01288, audio_tagging_loss=0.009575, over 2726283.46 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:59:47,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2023-11-27 11:59:57,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3049086.6666666665, ans=0.125 2023-11-27 12:00:02,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.448e+01 9.046e+01 9.688e+01 1.234e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 12:00:11,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049220.0, ans=0.1 2023-11-27 12:00:11,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3049220.0, ans=0.0 2023-11-27 12:00:21,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-27 12:00:29,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-27 12:00:35,341 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 500, loss[loss=0.06879, simple_loss=0.09523, pruned_loss=0.01551, audio_tagging_loss=0.005667, over 14547.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.0913, pruned_loss=0.01279, audio_tagging_loss=0.009413, over 2797828.70 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:00:42,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3049353.3333333335, ans=0.07 2023-11-27 12:00:50,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3049420.0, ans=0.0 2023-11-27 12:00:51,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3049420.0, ans=0.04949747468305833 2023-11-27 12:01:21,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3049620.0, ans=0.125 2023-11-27 12:01:22,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3049620.0, ans=0.1 2023-11-27 12:01:25,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-27 12:01:26,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3049620.0, ans=0.125 2023-11-27 12:01:30,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3049686.6666666665, ans=0.125 2023-11-27 12:01:32,370 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 550, loss[loss=0.0582, simple_loss=0.08186, pruned_loss=0.008249, audio_tagging_loss=0.009021, over 15462.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09065, pruned_loss=0.01276, audio_tagging_loss=0.009319, over 2847348.57 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:01:35,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-27 12:01:54,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3049820.0, ans=0.125 2023-11-27 12:01:56,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.636e+01 9.349e+01 1.005e+02 1.321e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 12:02:08,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-11-27 12:02:23,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-27 12:02:28,714 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 600, loss[loss=0.07466, simple_loss=0.09955, pruned_loss=0.01267, audio_tagging_loss=0.01222, over 16080.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09074, pruned_loss=0.01274, audio_tagging_loss=0.00934, over 2896546.30 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:02:28,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3050020.0, ans=0.2 2023-11-27 12:02:37,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3050020.0, ans=0.0 2023-11-27 12:02:43,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3050086.6666666665, ans=0.0 2023-11-27 12:02:51,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3050153.3333333335, ans=0.125 2023-11-27 12:03:15,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3050286.6666666665, ans=15.0 2023-11-27 12:03:20,402 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-27 12:03:25,688 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 650, loss[loss=0.06743, simple_loss=0.08648, pruned_loss=0.0125, audio_tagging_loss=0.01169, over 15840.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09071, pruned_loss=0.01274, audio_tagging_loss=0.009251, over 2924373.01 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:03:37,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3050420.0, ans=0.0 2023-11-27 12:03:43,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3050420.0, ans=10.0 2023-11-27 12:03:44,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3050420.0, ans=0.5 2023-11-27 12:03:52,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.673e+01 9.304e+01 1.013e+02 1.216e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:03:54,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-11-27 12:04:04,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3050553.3333333335, ans=0.0 2023-11-27 12:04:07,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3050553.3333333335, ans=0.95 2023-11-27 12:04:12,910 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:04:17,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-27 12:04:22,940 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 700, loss[loss=0.07027, simple_loss=0.09562, pruned_loss=0.01331, audio_tagging_loss=0.009149, over 15320.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09161, pruned_loss=0.01274, audio_tagging_loss=0.00915, over 2952715.29 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:04:34,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3050753.3333333335, ans=0.1 2023-11-27 12:04:53,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3050820.0, ans=0.2 2023-11-27 12:04:59,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3050886.6666666665, ans=0.0 2023-11-27 12:05:15,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-27 12:05:20,549 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 750, loss[loss=0.08966, simple_loss=0.1236, pruned_loss=0.01993, audio_tagging_loss=0.007933, over 15210.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09277, pruned_loss=0.01307, audio_tagging_loss=0.00904, over 2979147.77 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:05:27,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.26 vs. limit=10.0 2023-11-27 12:05:29,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-11-27 12:05:35,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3051086.6666666665, ans=0.125 2023-11-27 12:05:41,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3051086.6666666665, ans=0.0 2023-11-27 12:05:46,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.728e+01 9.485e+01 1.039e+02 1.301e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 12:06:03,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3051220.0, ans=0.125 2023-11-27 12:06:07,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3051286.6666666665, ans=10.0 2023-11-27 12:06:11,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-27 12:06:12,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3051286.6666666665, ans=0.09899494936611666 2023-11-27 12:06:18,053 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 800, loss[loss=0.05294, simple_loss=0.06878, pruned_loss=0.009436, audio_tagging_loss=0.009113, over 15939.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09248, pruned_loss=0.01305, audio_tagging_loss=0.009052, over 2996667.70 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:06:19,279 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:06:19,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3051353.3333333335, ans=0.0 2023-11-27 12:06:48,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3051486.6666666665, ans=0.04949747468305833 2023-11-27 12:06:50,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-27 12:06:52,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3051553.3333333335, ans=0.0 2023-11-27 12:06:55,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-27 12:06:59,294 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:07:09,785 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-27 12:07:13,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-27 12:07:14,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3051686.6666666665, ans=0.0 2023-11-27 12:07:15,109 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 850, loss[loss=0.07553, simple_loss=0.1062, pruned_loss=0.01365, audio_tagging_loss=0.008776, over 15367.00 frames. ], tot_loss[loss=0.06876, simple_loss=0.09264, pruned_loss=0.01329, audio_tagging_loss=0.009159, over 3006142.72 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:07:22,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=22.5 2023-11-27 12:07:26,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-27 12:07:41,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.486e+01 9.072e+01 9.874e+01 1.616e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 12:07:42,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3051820.0, ans=0.0 2023-11-27 12:07:43,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-11-27 12:07:44,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-27 12:07:44,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-27 12:07:59,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051886.6666666665, ans=0.1 2023-11-27 12:08:08,048 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-27 12:08:09,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3051953.3333333335, ans=0.125 2023-11-27 12:08:14,014 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 900, loss[loss=0.07115, simple_loss=0.09889, pruned_loss=0.01446, audio_tagging_loss=0.00725, over 14865.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09211, pruned_loss=0.01311, audio_tagging_loss=0.009235, over 3014303.66 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:08:19,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3052020.0, ans=0.125 2023-11-27 12:08:24,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3052086.6666666665, ans=0.125 2023-11-27 12:08:40,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052153.3333333335, ans=0.1 2023-11-27 12:08:50,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3052220.0, ans=0.125 2023-11-27 12:08:52,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3052220.0, ans=0.2 2023-11-27 12:09:05,798 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-27 12:09:05,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3052286.6666666665, ans=0.95 2023-11-27 12:09:11,247 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 950, loss[loss=0.07784, simple_loss=0.1086, pruned_loss=0.01646, audio_tagging_loss=0.007091, over 14847.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09301, pruned_loss=0.0133, audio_tagging_loss=0.009043, over 3023430.95 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:09:21,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052420.0, ans=0.1 2023-11-27 12:09:28,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3052420.0, ans=0.125 2023-11-27 12:09:38,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.734e+01 9.231e+01 1.017e+02 1.316e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 12:09:44,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2023-11-27 12:09:46,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3052553.3333333335, ans=0.0 2023-11-27 12:10:02,992 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-27 12:10:08,363 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1000, loss[loss=0.1027, simple_loss=0.1548, pruned_loss=0.02129, audio_tagging_loss=0.004012, over 15767.00 frames. ], tot_loss[loss=0.06864, simple_loss=0.09302, pruned_loss=0.01325, audio_tagging_loss=0.008884, over 3024565.42 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:10:30,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=15.0 2023-11-27 12:10:34,297 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:10:36,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3052820.0, ans=0.1 2023-11-27 12:10:44,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3052886.6666666665, ans=0.125 2023-11-27 12:10:53,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052953.3333333335, ans=0.1 2023-11-27 12:10:59,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3052953.3333333335, ans=0.05 2023-11-27 12:11:00,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-27 12:11:05,653 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1050, loss[loss=0.0651, simple_loss=0.08215, pruned_loss=0.01436, audio_tagging_loss=0.009659, over 15637.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09334, pruned_loss=0.0132, audio_tagging_loss=0.008683, over 3032646.86 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:11:32,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.482e+01 9.051e+01 9.992e+01 1.169e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 12:11:53,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-27 12:11:56,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-27 12:12:02,645 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1100, loss[loss=0.04202, simple_loss=0.05118, pruned_loss=0.00397, audio_tagging_loss=0.01246, over 14281.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09218, pruned_loss=0.01303, audio_tagging_loss=0.008707, over 3030102.06 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:12:04,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3053353.3333333335, ans=0.125 2023-11-27 12:12:04,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3053353.3333333335, ans=0.1 2023-11-27 12:12:07,033 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:12:24,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-27 12:12:29,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-27 12:12:34,599 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:12:36,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3053553.3333333335, ans=0.0 2023-11-27 12:12:39,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053553.3333333335, ans=0.1 2023-11-27 12:12:54,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-27 12:12:59,556 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1150, loss[loss=0.09044, simple_loss=0.1234, pruned_loss=0.01918, audio_tagging_loss=0.009553, over 15830.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09243, pruned_loss=0.01315, audio_tagging_loss=0.008651, over 3035363.66 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:12:59,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3053686.6666666665, ans=0.1 2023-11-27 12:13:27,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.562e+01 9.025e+01 9.989e+01 1.460e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-27 12:13:30,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053820.0, ans=0.1 2023-11-27 12:13:34,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3053886.6666666665, ans=0.125 2023-11-27 12:13:50,836 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-27 12:13:56,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3054020.0, ans=0.125 2023-11-27 12:13:57,000 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1200, loss[loss=0.06469, simple_loss=0.08461, pruned_loss=0.01279, audio_tagging_loss=0.009592, over 14372.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09128, pruned_loss=0.01298, audio_tagging_loss=0.008658, over 3028149.17 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:14:03,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3054020.0, ans=0.125 2023-11-27 12:14:47,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-27 12:14:50,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-27 12:14:53,242 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1250, loss[loss=0.07789, simple_loss=0.1047, pruned_loss=0.017, audio_tagging_loss=0.008557, over 15562.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.0906, pruned_loss=0.01281, audio_tagging_loss=0.008644, over 3024909.07 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:15:02,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3054353.3333333335, ans=0.2 2023-11-27 12:15:05,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3054420.0, ans=0.125 2023-11-27 12:15:17,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3054486.6666666665, ans=0.125 2023-11-27 12:15:20,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3054486.6666666665, ans=0.2 2023-11-27 12:15:21,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.636e+01 9.164e+01 9.922e+01 1.354e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 12:15:33,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3054553.3333333335, ans=0.0 2023-11-27 12:15:38,783 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:15:40,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3054620.0, ans=0.0 2023-11-27 12:15:44,071 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-27 12:15:50,097 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1300, loss[loss=0.0495, simple_loss=0.05881, pruned_loss=0.007728, audio_tagging_loss=0.01237, over 14657.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09058, pruned_loss=0.01276, audio_tagging_loss=0.008641, over 3031885.92 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:15:51,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3054686.6666666665, ans=0.0 2023-11-27 12:16:05,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3054753.3333333335, ans=0.0 2023-11-27 12:16:12,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3054820.0, ans=0.0 2023-11-27 12:16:14,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3054820.0, ans=0.0 2023-11-27 12:16:31,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-27 12:16:34,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3054953.3333333335, ans=0.125 2023-11-27 12:16:37,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2023-11-27 12:16:40,676 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-27 12:16:46,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-27 12:16:46,971 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1350, loss[loss=0.05892, simple_loss=0.07548, pruned_loss=0.0113, audio_tagging_loss=0.009881, over 14562.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08974, pruned_loss=0.01267, audio_tagging_loss=0.008688, over 3032376.78 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:16:49,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=22.5 2023-11-27 12:17:00,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3055086.6666666665, ans=0.2 2023-11-27 12:17:12,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3055153.3333333335, ans=0.125 2023-11-27 12:17:13,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.626e+01 9.162e+01 9.999e+01 1.247e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 12:17:31,790 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:17:32,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-27 12:17:36,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-11-27 12:17:38,507 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-27 12:17:43,923 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1400, loss[loss=0.06076, simple_loss=0.08598, pruned_loss=0.01223, audio_tagging_loss=0.005543, over 16022.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08978, pruned_loss=0.01257, audio_tagging_loss=0.008825, over 3038442.53 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:18:03,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-27 12:18:05,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3055486.6666666665, ans=0.0 2023-11-27 12:18:19,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-27 12:18:23,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3055553.3333333335, ans=0.5 2023-11-27 12:18:24,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3055553.3333333335, ans=0.0 2023-11-27 12:18:29,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=22.5 2023-11-27 12:18:35,176 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-27 12:18:40,525 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1450, loss[loss=0.08929, simple_loss=0.112, pruned_loss=0.01782, audio_tagging_loss=0.01547, over 16874.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08979, pruned_loss=0.01249, audio_tagging_loss=0.008977, over 3039003.54 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:18:49,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2023-11-27 12:19:05,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3055820.0, ans=0.1 2023-11-27 12:19:06,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-27 12:19:08,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.606e+01 9.355e+01 1.005e+02 1.686e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 12:19:18,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3055886.6666666665, ans=0.0 2023-11-27 12:19:30,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3055953.3333333335, ans=0.125 2023-11-27 12:19:31,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-27 12:19:37,790 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1500, loss[loss=0.06575, simple_loss=0.07803, pruned_loss=0.01373, audio_tagging_loss=0.01301, over 15133.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08917, pruned_loss=0.01245, audio_tagging_loss=0.009088, over 3040797.96 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:19:42,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2023-11-27 12:19:43,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3056020.0, ans=0.125 2023-11-27 12:20:07,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3056153.3333333335, ans=0.125 2023-11-27 12:20:10,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3056220.0, ans=0.0 2023-11-27 12:20:27,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3056286.6666666665, ans=0.2 2023-11-27 12:20:30,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-27 12:20:35,427 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1550, loss[loss=0.05223, simple_loss=0.06394, pruned_loss=0.008594, audio_tagging_loss=0.01167, over 15135.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08985, pruned_loss=0.01255, audio_tagging_loss=0.009069, over 3037509.20 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:36,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3056353.3333333335, ans=0.125 2023-11-27 12:21:01,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.639e+01 9.120e+01 9.883e+01 1.538e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:21:16,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-27 12:21:26,515 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-27 12:21:31,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3056686.6666666665, ans=0.0 2023-11-27 12:21:31,979 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1600, loss[loss=0.06115, simple_loss=0.08776, pruned_loss=0.008114, audio_tagging_loss=0.009151, over 15130.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08993, pruned_loss=0.01251, audio_tagging_loss=0.009219, over 3047993.01 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:22:20,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3056953.3333333335, ans=0.125 2023-11-27 12:22:23,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-27 12:22:28,722 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1650, loss[loss=0.079, simple_loss=0.1052, pruned_loss=0.01571, audio_tagging_loss=0.01071, over 15846.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08995, pruned_loss=0.01251, audio_tagging_loss=0.009234, over 3051442.69 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:22:57,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.745e+01 9.326e+01 9.935e+01 1.288e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:22:57,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3057153.3333333335, ans=0.2 2023-11-27 12:23:17,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3057286.6666666665, ans=0.125 2023-11-27 12:23:21,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-27 12:23:22,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-27 12:23:29,016 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1700, loss[loss=0.07063, simple_loss=0.1051, pruned_loss=0.01159, audio_tagging_loss=0.006476, over 16111.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09011, pruned_loss=0.01274, audio_tagging_loss=0.009201, over 3047316.48 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:23:41,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3057420.0, ans=0.0 2023-11-27 12:24:20,641 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-27 12:24:25,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3057686.6666666665, ans=0.2 2023-11-27 12:24:26,123 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1750, loss[loss=0.06682, simple_loss=0.08917, pruned_loss=0.0121, audio_tagging_loss=0.01013, over 16069.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09075, pruned_loss=0.01274, audio_tagging_loss=0.009117, over 3043761.27 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:24:31,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2023-11-27 12:24:37,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3057753.3333333335, ans=0.2 2023-11-27 12:24:46,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057753.3333333335, ans=0.1 2023-11-27 12:24:53,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3057820.0, ans=0.125 2023-11-27 12:24:54,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.466e+01 9.121e+01 9.743e+01 1.211e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:25:01,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2023-11-27 12:25:17,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-27 12:25:23,180 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1800, loss[loss=0.04305, simple_loss=0.05291, pruned_loss=0.007046, audio_tagging_loss=0.009553, over 16606.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09002, pruned_loss=0.01254, audio_tagging_loss=0.009052, over 3039859.92 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:25:46,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3058153.3333333335, ans=0.125 2023-11-27 12:25:48,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3058153.3333333335, ans=0.2 2023-11-27 12:25:54,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3058153.3333333335, ans=0.125 2023-11-27 12:26:09,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3058286.6666666665, ans=0.0 2023-11-27 12:26:16,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-27 12:26:22,255 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1850, loss[loss=0.06466, simple_loss=0.08418, pruned_loss=0.01064, audio_tagging_loss=0.01193, over 14770.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08991, pruned_loss=0.01248, audio_tagging_loss=0.009009, over 3036623.28 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:26:47,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-27 12:26:48,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3058486.6666666665, ans=0.125 2023-11-27 12:26:50,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.879e+01 9.441e+01 1.010e+02 1.422e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 12:27:04,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3058553.3333333335, ans=0.125 2023-11-27 12:27:05,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=22.5 2023-11-27 12:27:13,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-27 12:27:20,577 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1900, loss[loss=0.05313, simple_loss=0.06896, pruned_loss=0.01073, audio_tagging_loss=0.007925, over 14262.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08987, pruned_loss=0.01246, audio_tagging_loss=0.008826, over 3035469.10 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:27:24,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2023-11-27 12:27:25,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3058686.6666666665, ans=0.1 2023-11-27 12:27:26,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=12.0 2023-11-27 12:27:28,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3058686.6666666665, ans=0.125 2023-11-27 12:28:12,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-27 12:28:17,859 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1950, loss[loss=0.05542, simple_loss=0.07483, pruned_loss=0.008275, audio_tagging_loss=0.009726, over 15805.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08977, pruned_loss=0.01247, audio_tagging_loss=0.008802, over 3031063.13 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:28:22,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-27 12:28:46,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.494e+01 8.981e+01 9.864e+01 1.306e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 12:29:11,125 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-27 12:29:13,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3059286.6666666665, ans=0.125 2023-11-27 12:29:13,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3059286.6666666665, ans=0.2 2023-11-27 12:29:14,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3059286.6666666665, ans=0.2 2023-11-27 12:29:16,528 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2000, loss[loss=0.0668, simple_loss=0.08934, pruned_loss=0.0121, audio_tagging_loss=0.01003, over 15286.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0895, pruned_loss=0.01251, audio_tagging_loss=0.008882, over 3034078.49 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:29:23,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-11-27 12:29:34,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3059420.0, ans=0.2 2023-11-27 12:29:36,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-27 12:29:39,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3059486.6666666665, ans=0.04949747468305833 2023-11-27 12:29:50,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3059553.3333333335, ans=0.2 2023-11-27 12:30:03,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=22.5 2023-11-27 12:30:08,139 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-27 12:30:13,673 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2050, loss[loss=0.06781, simple_loss=0.09777, pruned_loss=0.01141, audio_tagging_loss=0.007522, over 16569.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08906, pruned_loss=0.01261, audio_tagging_loss=0.008889, over 3034082.88 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:30:14,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3059686.6666666665, ans=0.125 2023-11-27 12:30:32,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059753.3333333335, ans=0.1 2023-11-27 12:30:41,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3059820.0, ans=0.0 2023-11-27 12:30:43,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.669e+01 9.305e+01 9.867e+01 1.531e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:30:49,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3059886.6666666665, ans=0.0 2023-11-27 12:31:01,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-27 12:31:06,126 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-27 12:31:06,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3059953.3333333335, ans=10.0 2023-11-27 12:31:12,192 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2100, loss[loss=0.05298, simple_loss=0.06889, pruned_loss=0.008922, audio_tagging_loss=0.009617, over 15710.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08956, pruned_loss=0.01268, audio_tagging_loss=0.008799, over 3038970.74 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:31:34,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3060153.3333333335, ans=0.0 2023-11-27 12:31:40,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-27 12:31:48,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3060220.0, ans=0.125 2023-11-27 12:32:03,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3060286.6666666665, ans=0.0 2023-11-27 12:32:04,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-27 12:32:10,920 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2150, loss[loss=0.06758, simple_loss=0.09234, pruned_loss=0.01402, audio_tagging_loss=0.007385, over 15017.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08967, pruned_loss=0.01272, audio_tagging_loss=0.00879, over 3041986.68 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:32:16,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3060353.3333333335, ans=0.125 2023-11-27 12:32:29,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3060420.0, ans=0.125 2023-11-27 12:32:39,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.486e+01 9.121e+01 9.969e+01 1.223e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:32:44,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3060553.3333333335, ans=0.125 2023-11-27 12:32:47,743 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:33:02,287 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-27 12:33:07,660 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2200, loss[loss=0.06535, simple_loss=0.09124, pruned_loss=0.01162, audio_tagging_loss=0.008113, over 15796.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09008, pruned_loss=0.0127, audio_tagging_loss=0.008655, over 3045863.73 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:33:12,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2023-11-27 12:33:16,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3060686.6666666665, ans=0.125 2023-11-27 12:33:22,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-27 12:33:31,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2023-11-27 12:33:39,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3060820.0, ans=0.2 2023-11-27 12:33:53,138 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:33:59,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-27 12:34:01,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3060953.3333333335, ans=0.0 2023-11-27 12:34:05,525 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2250, loss[loss=0.06081, simple_loss=0.08363, pruned_loss=0.01247, audio_tagging_loss=0.006533, over 14899.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09081, pruned_loss=0.01277, audio_tagging_loss=0.008744, over 3043633.47 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:34:20,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3061086.6666666665, ans=0.125 2023-11-27 12:34:24,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3061086.6666666665, ans=0.2 2023-11-27 12:34:36,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.598e+01 9.189e+01 9.920e+01 1.225e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 12:34:36,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3061153.3333333335, ans=0.2 2023-11-27 12:34:57,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-27 12:34:58,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3061286.6666666665, ans=0.0 2023-11-27 12:34:58,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3061286.6666666665, ans=0.0 2023-11-27 12:35:05,896 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2300, loss[loss=0.08395, simple_loss=0.109, pruned_loss=0.01802, audio_tagging_loss=0.01143, over 17110.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09102, pruned_loss=0.01276, audio_tagging_loss=0.008728, over 3051793.82 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:35:12,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3061353.3333333335, ans=0.125 2023-11-27 12:35:29,053 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:35:32,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-27 12:35:38,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-27 12:35:44,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-27 12:35:53,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2023-11-27 12:35:57,618 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-27 12:35:59,772 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:36:03,025 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2350, loss[loss=0.06299, simple_loss=0.07473, pruned_loss=0.01164, audio_tagging_loss=0.01398, over 14802.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09096, pruned_loss=0.01268, audio_tagging_loss=0.008901, over 3043918.03 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:36:07,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3061686.6666666665, ans=0.2 2023-11-27 12:36:13,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3061753.3333333335, ans=0.0 2023-11-27 12:36:17,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3061753.3333333335, ans=0.2 2023-11-27 12:36:28,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-27 12:36:34,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.777e+01 9.429e+01 1.022e+02 1.253e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 12:36:34,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3061820.0, ans=0.125 2023-11-27 12:36:34,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3061820.0, ans=0.1 2023-11-27 12:36:55,283 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-27 12:37:00,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3062020.0, ans=0.125 2023-11-27 12:37:00,897 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2400, loss[loss=0.06047, simple_loss=0.07893, pruned_loss=0.01048, audio_tagging_loss=0.01052, over 14272.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09158, pruned_loss=0.01283, audio_tagging_loss=0.008929, over 3041197.13 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:37:04,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3062020.0, ans=0.125 2023-11-27 12:37:15,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3062086.6666666665, ans=0.1 2023-11-27 12:37:23,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3062086.6666666665, ans=0.125 2023-11-27 12:37:26,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3062153.3333333335, ans=10.0 2023-11-27 12:37:28,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3062153.3333333335, ans=0.125 2023-11-27 12:37:36,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3062220.0, ans=10.0 2023-11-27 12:37:37,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3062220.0, ans=0.0 2023-11-27 12:37:40,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3062220.0, ans=0.2 2023-11-27 12:37:42,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2023-11-27 12:37:48,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3062286.6666666665, ans=0.2 2023-11-27 12:37:52,996 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-27 12:37:59,589 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2450, loss[loss=0.06824, simple_loss=0.09998, pruned_loss=0.009553, audio_tagging_loss=0.008698, over 15359.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09086, pruned_loss=0.01272, audio_tagging_loss=0.009086, over 3041043.10 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:38:05,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=22.5 2023-11-27 12:38:28,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.321e+01 9.410e+01 9.969e+01 1.274e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:38:29,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2023-11-27 12:38:51,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-27 12:38:57,350 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2500, loss[loss=0.04812, simple_loss=0.06307, pruned_loss=0.007009, audio_tagging_loss=0.009573, over 14828.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09136, pruned_loss=0.01281, audio_tagging_loss=0.009096, over 3045023.26 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:39:02,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3062686.6666666665, ans=0.125 2023-11-27 12:39:08,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-27 12:39:09,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-27 12:39:16,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-27 12:39:40,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3062886.6666666665, ans=0.0 2023-11-27 12:39:49,328 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-27 12:39:49,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062953.3333333335, ans=0.1 2023-11-27 12:39:52,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3062953.3333333335, ans=0.0 2023-11-27 12:39:54,774 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2550, loss[loss=0.07243, simple_loss=0.09379, pruned_loss=0.01434, audio_tagging_loss=0.01119, over 15140.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.0914, pruned_loss=0.01283, audio_tagging_loss=0.008962, over 3054010.25 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:40:06,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-27 12:40:25,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2023-11-27 12:40:25,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3063153.3333333335, ans=0.125 2023-11-27 12:40:26,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.678e+01 9.247e+01 1.003e+02 1.223e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 12:40:30,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3063220.0, ans=0.125 2023-11-27 12:40:46,503 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-27 12:40:51,927 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2600, loss[loss=0.05214, simple_loss=0.06423, pruned_loss=0.008641, audio_tagging_loss=0.01138, over 15256.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09055, pruned_loss=0.01279, audio_tagging_loss=0.008837, over 3043923.73 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:41:02,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-27 12:41:02,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-27 12:41:35,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3063553.3333333335, ans=0.125 2023-11-27 12:41:45,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-27 12:41:49,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3063686.6666666665, ans=0.125 2023-11-27 12:41:50,868 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2650, loss[loss=0.07849, simple_loss=0.1019, pruned_loss=0.01677, audio_tagging_loss=0.01079, over 15456.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09153, pruned_loss=0.01299, audio_tagging_loss=0.008751, over 3044708.47 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:41:55,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3063686.6666666665, ans=0.0 2023-11-27 12:41:59,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-27 12:42:07,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3063753.3333333335, ans=0.125 2023-11-27 12:42:07,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3063753.3333333335, ans=0.1 2023-11-27 12:42:13,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3063820.0, ans=0.1 2023-11-27 12:42:17,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3063820.0, ans=0.125 2023-11-27 12:42:20,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.348e+01 9.301e+01 1.026e+02 1.495e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 12:42:42,194 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-27 12:42:48,205 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2700, loss[loss=0.0557, simple_loss=0.0666, pruned_loss=0.01153, audio_tagging_loss=0.01087, over 13794.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09127, pruned_loss=0.01296, audio_tagging_loss=0.008704, over 3043549.29 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:42:52,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3064020.0, ans=0.1 2023-11-27 12:43:24,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3064220.0, ans=0.95 2023-11-27 12:43:31,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3064220.0, ans=0.125 2023-11-27 12:43:39,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-27 12:43:42,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3064286.6666666665, ans=0.0 2023-11-27 12:43:42,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2023-11-27 12:43:45,272 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2750, loss[loss=0.07441, simple_loss=0.1005, pruned_loss=0.01712, audio_tagging_loss=0.007059, over 16198.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09062, pruned_loss=0.01288, audio_tagging_loss=0.008792, over 3048370.15 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:43:50,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3064353.3333333335, ans=0.0 2023-11-27 12:44:07,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3064420.0, ans=0.125 2023-11-27 12:44:08,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3064486.6666666665, ans=0.0 2023-11-27 12:44:09,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3064486.6666666665, ans=0.2 2023-11-27 12:44:17,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.367e+01 8.947e+01 9.823e+01 1.478e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-27 12:44:38,948 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:44:38,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-27 12:44:44,993 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2800, loss[loss=0.07089, simple_loss=0.09758, pruned_loss=0.0152, audio_tagging_loss=0.006898, over 14805.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09058, pruned_loss=0.01282, audio_tagging_loss=0.008764, over 3050686.59 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:44:51,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3064686.6666666665, ans=0.125 2023-11-27 12:45:02,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-27 12:45:03,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3064753.3333333335, ans=0.05 2023-11-27 12:45:12,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3064820.0, ans=0.125 2023-11-27 12:45:16,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-27 12:45:22,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-27 12:45:27,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-27 12:45:36,808 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-27 12:45:42,194 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2850, loss[loss=0.04408, simple_loss=0.05257, pruned_loss=0.006088, audio_tagging_loss=0.0117, over 16382.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08981, pruned_loss=0.01282, audio_tagging_loss=0.008764, over 3041233.44 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:45:45,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3065020.0, ans=0.125 2023-11-27 12:45:57,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3065086.6666666665, ans=0.0 2023-11-27 12:45:58,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-11-27 12:46:04,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3065153.3333333335, ans=0.0 2023-11-27 12:46:14,414 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.447e+01 9.117e+01 9.906e+01 1.324e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 12:46:15,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3065153.3333333335, ans=0.2 2023-11-27 12:46:26,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065220.0, ans=0.1 2023-11-27 12:46:34,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-27 12:46:34,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-27 12:46:35,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3065286.6666666665, ans=0.125 2023-11-27 12:46:40,266 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2900, loss[loss=0.06719, simple_loss=0.09786, pruned_loss=0.01196, audio_tagging_loss=0.006299, over 14334.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08929, pruned_loss=0.01259, audio_tagging_loss=0.008727, over 3031935.76 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:46:52,148 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:47:03,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3065486.6666666665, ans=0.125 2023-11-27 12:47:13,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3065486.6666666665, ans=0.0 2023-11-27 12:47:19,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3065553.3333333335, ans=0.035 2023-11-27 12:47:33,574 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-27 12:47:39,697 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2950, loss[loss=0.06326, simple_loss=0.08777, pruned_loss=0.009137, audio_tagging_loss=0.01024, over 14390.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09029, pruned_loss=0.01275, audio_tagging_loss=0.008795, over 3044175.92 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:47:41,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3065686.6666666665, ans=0.125 2023-11-27 12:47:45,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3065686.6666666665, ans=0.125 2023-11-27 12:47:56,997 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:47:59,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-27 12:47:59,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-27 12:48:06,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3065820.0, ans=0.125 2023-11-27 12:48:10,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.648e+01 9.263e+01 1.004e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 12:48:19,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3065886.6666666665, ans=0.0 2023-11-27 12:48:27,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3065953.3333333335, ans=0.2 2023-11-27 12:48:30,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-27 12:48:32,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-27 12:48:37,508 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3000, loss[loss=0.0685, simple_loss=0.09267, pruned_loss=0.0137, audio_tagging_loss=0.00846, over 17499.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.091, pruned_loss=0.01289, audio_tagging_loss=0.008837, over 3048974.89 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:48:37,509 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 12:49:11,850 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05767, simple_loss=0.05074, pruned_loss=0.005233, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 12:49:11,851 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 12:49:53,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3066220.0, ans=0.0 2023-11-27 12:50:05,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-27 12:50:10,850 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3050, loss[loss=0.057, simple_loss=0.08143, pruned_loss=0.007221, audio_tagging_loss=0.009068, over 15982.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09046, pruned_loss=0.01281, audio_tagging_loss=0.008845, over 3054284.29 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:50:22,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-27 12:50:32,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3066486.6666666665, ans=0.125 2023-11-27 12:50:39,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3066486.6666666665, ans=0.1 2023-11-27 12:50:42,418 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.816e+01 9.327e+01 1.004e+02 1.225e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:50:47,030 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:50:48,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-27 12:50:51,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-27 12:51:03,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-27 12:51:11,810 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3100, loss[loss=0.07595, simple_loss=0.0955, pruned_loss=0.01818, audio_tagging_loss=0.01002, over 15433.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09113, pruned_loss=0.013, audio_tagging_loss=0.008886, over 3057259.66 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:51:55,707 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-27 12:52:04,039 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-27 12:52:09,584 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3150, loss[loss=0.06966, simple_loss=0.09803, pruned_loss=0.01199, audio_tagging_loss=0.008655, over 15996.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09227, pruned_loss=0.01326, audio_tagging_loss=0.008843, over 3057514.49 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:52:23,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3067086.6666666665, ans=0.125 2023-11-27 12:52:42,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.556e+01 9.152e+01 9.802e+01 1.189e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 12:53:02,645 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-27 12:53:08,848 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3200, loss[loss=0.05725, simple_loss=0.07977, pruned_loss=0.008563, audio_tagging_loss=0.008808, over 15104.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09173, pruned_loss=0.01302, audio_tagging_loss=0.009016, over 3055910.06 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:53:12,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3067353.3333333335, ans=0.2 2023-11-27 12:53:22,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3067420.0, ans=0.2 2023-11-27 12:53:22,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3067420.0, ans=0.125 2023-11-27 12:53:24,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-27 12:53:29,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3067420.0, ans=0.1 2023-11-27 12:53:30,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2023-11-27 12:53:35,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-27 12:53:38,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=22.5 2023-11-27 12:53:57,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-27 12:54:01,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-27 12:54:06,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.24 vs. limit=10.0 2023-11-27 12:54:06,397 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3250, loss[loss=0.0796, simple_loss=0.1172, pruned_loss=0.01209, audio_tagging_loss=0.008924, over 15374.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09167, pruned_loss=0.01307, audio_tagging_loss=0.009089, over 3054397.72 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:54:11,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3067686.6666666665, ans=0.0 2023-11-27 12:54:17,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-27 12:54:28,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3067820.0, ans=0.125 2023-11-27 12:54:39,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.843e+01 9.410e+01 1.014e+02 1.334e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:54:59,304 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-27 12:55:00,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-27 12:55:05,132 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3300, loss[loss=0.06237, simple_loss=0.08092, pruned_loss=0.01098, audio_tagging_loss=0.01093, over 14749.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09182, pruned_loss=0.01303, audio_tagging_loss=0.009101, over 3055377.41 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:55:08,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3068020.0, ans=0.1 2023-11-27 12:55:25,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2023-11-27 12:55:27,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3068086.6666666665, ans=0.125 2023-11-27 12:55:34,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-27 12:55:52,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3068286.6666666665, ans=10.0 2023-11-27 12:55:55,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068286.6666666665, ans=0.1 2023-11-27 12:55:58,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-27 12:56:04,631 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3350, loss[loss=0.08413, simple_loss=0.1153, pruned_loss=0.01618, audio_tagging_loss=0.01029, over 15124.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09121, pruned_loss=0.01293, audio_tagging_loss=0.00899, over 3051894.41 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:56:09,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3068353.3333333335, ans=0.125 2023-11-27 12:56:10,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3068353.3333333335, ans=0.1 2023-11-27 12:56:14,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-27 12:56:36,193 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.562e+01 9.285e+01 9.934e+01 1.474e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 12:56:46,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-27 12:56:56,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-27 12:57:00,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3068620.0, ans=0.2 2023-11-27 12:57:02,295 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3400, loss[loss=0.07425, simple_loss=0.1067, pruned_loss=0.01336, audio_tagging_loss=0.007541, over 15113.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09099, pruned_loss=0.0129, audio_tagging_loss=0.008824, over 3051823.75 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:57:28,577 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:57:28,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-11-27 12:57:31,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3068820.0, ans=0.2 2023-11-27 12:57:38,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3068886.6666666665, ans=0.1 2023-11-27 12:57:44,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-27 12:57:54,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-27 12:58:00,427 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3450, loss[loss=0.05808, simple_loss=0.07986, pruned_loss=0.009441, audio_tagging_loss=0.008712, over 14888.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09021, pruned_loss=0.01271, audio_tagging_loss=0.008733, over 3049342.13 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:58:03,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3069020.0, ans=0.5 2023-11-27 12:58:04,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-27 12:58:10,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3069020.0, ans=0.025 2023-11-27 12:58:14,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-27 12:58:32,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.564e+01 9.034e+01 9.987e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 12:58:44,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-11-27 12:58:48,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3069286.6666666665, ans=0.0 2023-11-27 12:58:52,154 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-27 12:58:52,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3069286.6666666665, ans=0.125 2023-11-27 12:58:58,877 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3500, loss[loss=0.06169, simple_loss=0.08126, pruned_loss=0.0114, audio_tagging_loss=0.009654, over 15031.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09101, pruned_loss=0.01293, audio_tagging_loss=0.008715, over 3051654.73 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:59:30,596 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:59:35,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.34 vs. limit=10.0 2023-11-27 12:59:49,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3069620.0, ans=0.2 2023-11-27 12:59:51,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-27 12:59:56,773 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3550, loss[loss=0.05955, simple_loss=0.08389, pruned_loss=0.009411, audio_tagging_loss=0.008191, over 16329.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09098, pruned_loss=0.013, audio_tagging_loss=0.008772, over 3049876.86 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:00:07,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-27 13:00:20,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3069820.0, ans=0.0 2023-11-27 13:00:27,343 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:00:28,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3069820.0, ans=0.04949747468305833 2023-11-27 13:00:30,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.573e+01 9.051e+01 9.738e+01 1.451e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 13:00:43,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3069953.3333333335, ans=0.0 2023-11-27 13:00:48,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-27 13:00:49,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069953.3333333335, ans=0.1 2023-11-27 13:00:54,049 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3600, loss[loss=0.06218, simple_loss=0.08248, pruned_loss=0.009137, audio_tagging_loss=0.0118, over 15699.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.0906, pruned_loss=0.01287, audio_tagging_loss=0.008793, over 3054924.61 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:01:05,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3070086.6666666665, ans=0.125 2023-11-27 13:01:46,315 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-27 13:01:52,373 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3650, loss[loss=0.05519, simple_loss=0.07678, pruned_loss=0.011, audio_tagging_loss=0.005796, over 15117.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09173, pruned_loss=0.01305, audio_tagging_loss=0.008723, over 3053039.45 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:01:54,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-27 13:02:05,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-27 13:02:07,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3070420.0, ans=0.125 2023-11-27 13:02:09,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3070420.0, ans=0.2 2023-11-27 13:02:20,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3070486.6666666665, ans=0.0 2023-11-27 13:02:25,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.646e+01 9.266e+01 1.020e+02 1.594e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 13:02:30,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-27 13:02:35,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070553.3333333335, ans=0.1 2023-11-27 13:02:44,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3070620.0, ans=0.125 2023-11-27 13:02:45,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-27 13:02:49,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3070620.0, ans=0.0 2023-11-27 13:02:51,398 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3700, loss[loss=0.06304, simple_loss=0.08914, pruned_loss=0.009265, audio_tagging_loss=0.009206, over 16217.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09146, pruned_loss=0.01293, audio_tagging_loss=0.008732, over 3058513.04 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:03:08,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3070753.3333333335, ans=0.0 2023-11-27 13:03:13,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3070820.0, ans=0.0 2023-11-27 13:03:43,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-27 13:03:48,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3071020.0, ans=0.0 2023-11-27 13:03:48,832 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3750, loss[loss=0.05627, simple_loss=0.07509, pruned_loss=0.009272, audio_tagging_loss=0.009451, over 16801.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09106, pruned_loss=0.01284, audio_tagging_loss=0.008771, over 3052319.74 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:04:03,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3071086.6666666665, ans=0.0 2023-11-27 13:04:22,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-27 13:04:23,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.737e+01 9.313e+01 1.018e+02 1.236e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:04:25,736 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:04:32,090 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:04:33,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3071220.0, ans=0.0 2023-11-27 13:04:40,836 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-27 13:04:46,921 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3800, loss[loss=0.03974, simple_loss=0.05159, pruned_loss=0.004126, audio_tagging_loss=0.009816, over 15207.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09072, pruned_loss=0.01292, audio_tagging_loss=0.008855, over 3050387.69 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:04:49,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3071353.3333333335, ans=0.05 2023-11-27 13:05:24,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3071553.3333333335, ans=0.2 2023-11-27 13:05:36,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3071620.0, ans=0.0 2023-11-27 13:05:36,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3071620.0, ans=0.2 2023-11-27 13:05:40,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-27 13:05:45,985 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3850, loss[loss=0.07328, simple_loss=0.09946, pruned_loss=0.01545, audio_tagging_loss=0.008102, over 15913.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09068, pruned_loss=0.01292, audio_tagging_loss=0.008941, over 3048201.48 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:05:48,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-11-27 13:06:13,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071820.0, ans=0.1 2023-11-27 13:06:18,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.582e+01 9.212e+01 9.899e+01 1.418e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:06:19,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3071886.6666666665, ans=0.0 2023-11-27 13:06:20,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3071886.6666666665, ans=0.025 2023-11-27 13:06:27,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3071886.6666666665, ans=0.025 2023-11-27 13:06:34,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3071953.3333333335, ans=0.125 2023-11-27 13:06:37,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-27 13:06:41,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-27 13:06:43,160 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3900, loss[loss=0.05702, simple_loss=0.07284, pruned_loss=0.008688, audio_tagging_loss=0.01191, over 15848.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09136, pruned_loss=0.01299, audio_tagging_loss=0.00895, over 3043588.69 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:06:58,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3072086.6666666665, ans=0.1 2023-11-27 13:07:28,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3072286.6666666665, ans=0.0 2023-11-27 13:07:32,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3072286.6666666665, ans=0.025 2023-11-27 13:07:34,941 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-27 13:07:38,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3072286.6666666665, ans=0.125 2023-11-27 13:07:40,279 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3950, loss[loss=0.05701, simple_loss=0.07586, pruned_loss=0.009963, audio_tagging_loss=0.009114, over 14911.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09153, pruned_loss=0.01294, audio_tagging_loss=0.008932, over 3045780.84 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:49,036 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:07:54,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3072420.0, ans=0.0 2023-11-27 13:07:57,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3072420.0, ans=0.125 2023-11-27 13:08:03,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-27 13:08:06,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3072486.6666666665, ans=0.0 2023-11-27 13:08:15,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.964e+01 9.425e+01 1.000e+02 1.456e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 13:08:17,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-27 13:08:20,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3072553.3333333335, ans=0.0 2023-11-27 13:08:33,492 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-27 13:08:39,769 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4000, loss[loss=0.05906, simple_loss=0.07533, pruned_loss=0.009582, audio_tagging_loss=0.01182, over 15082.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09127, pruned_loss=0.01286, audio_tagging_loss=0.009081, over 3044274.05 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:08:41,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3072686.6666666665, ans=0.125 2023-11-27 13:09:31,973 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-27 13:09:33,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-27 13:09:37,345 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4050, loss[loss=0.06706, simple_loss=0.0938, pruned_loss=0.01439, audio_tagging_loss=0.005768, over 15166.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09175, pruned_loss=0.01307, audio_tagging_loss=0.009092, over 3051768.17 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:09:42,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-27 13:09:43,848 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:09:50,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2023-11-27 13:09:54,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3073086.6666666665, ans=0.125 2023-11-27 13:09:58,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3073153.3333333335, ans=0.05 2023-11-27 13:10:10,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3073153.3333333335, ans=0.125 2023-11-27 13:10:10,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3073153.3333333335, ans=0.125 2023-11-27 13:10:11,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3073220.0, ans=0.0 2023-11-27 13:10:13,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.613e+01 9.165e+01 1.034e+02 1.408e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:10:16,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3073220.0, ans=0.125 2023-11-27 13:10:28,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-27 13:10:34,539 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4100, loss[loss=0.06544, simple_loss=0.09266, pruned_loss=0.009783, audio_tagging_loss=0.009332, over 14214.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09235, pruned_loss=0.01314, audio_tagging_loss=0.009035, over 3054205.08 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:10:47,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3073420.0, ans=0.2 2023-11-27 13:10:51,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-27 13:11:17,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3073553.3333333335, ans=0.125 2023-11-27 13:11:21,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3073620.0, ans=0.0 2023-11-27 13:11:24,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3073620.0, ans=0.05 2023-11-27 13:11:24,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3073620.0, ans=0.1 2023-11-27 13:11:26,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-27 13:11:33,387 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4150, loss[loss=0.06825, simple_loss=0.08112, pruned_loss=0.01496, audio_tagging_loss=0.01273, over 15860.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09155, pruned_loss=0.01284, audio_tagging_loss=0.008901, over 3048560.98 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:11:35,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073686.6666666665, ans=0.1 2023-11-27 13:11:48,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3073753.3333333335, ans=0.1 2023-11-27 13:11:50,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3073753.3333333335, ans=0.125 2023-11-27 13:11:55,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3073820.0, ans=0.0 2023-11-27 13:11:59,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3073820.0, ans=0.125 2023-11-27 13:12:08,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.439e+01 9.039e+01 1.003e+02 1.334e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-27 13:12:18,763 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:12:25,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-27 13:12:31,248 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4200, loss[loss=0.07306, simple_loss=0.104, pruned_loss=0.01291, audio_tagging_loss=0.008137, over 15187.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09201, pruned_loss=0.01294, audio_tagging_loss=0.008866, over 3049794.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:12:42,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3074086.6666666665, ans=0.1 2023-11-27 13:12:43,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3074086.6666666665, ans=0.125 2023-11-27 13:12:58,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074153.3333333335, ans=0.1 2023-11-27 13:13:01,665 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:13:09,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3074220.0, ans=0.125 2023-11-27 13:13:23,347 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-27 13:13:25,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3074286.6666666665, ans=0.125 2023-11-27 13:13:26,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3074286.6666666665, ans=10.0 2023-11-27 13:13:28,881 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4250, loss[loss=0.07019, simple_loss=0.0972, pruned_loss=0.01197, audio_tagging_loss=0.009626, over 14614.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09302, pruned_loss=0.01305, audio_tagging_loss=0.008725, over 3047397.04 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:13:47,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-27 13:13:57,971 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:14:03,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3074553.3333333335, ans=22.5 2023-11-27 13:14:06,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.652e+01 9.233e+01 9.893e+01 1.518e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 13:14:11,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3074553.3333333335, ans=0.0 2023-11-27 13:14:18,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3074620.0, ans=0.1 2023-11-27 13:14:20,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-27 13:14:25,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074620.0, ans=0.125 2023-11-27 13:14:27,663 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4300, loss[loss=0.0573, simple_loss=0.07053, pruned_loss=0.01205, audio_tagging_loss=0.009989, over 14824.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09274, pruned_loss=0.01297, audio_tagging_loss=0.008637, over 3046612.79 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:14:31,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3074686.6666666665, ans=0.125 2023-11-27 13:14:41,109 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:14:44,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3074753.3333333335, ans=10.0 2023-11-27 13:14:46,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3074753.3333333335, ans=0.125 2023-11-27 13:14:50,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2023-11-27 13:15:18,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2023-11-27 13:15:19,940 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-27 13:15:25,856 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4350, loss[loss=0.07715, simple_loss=0.1036, pruned_loss=0.01696, audio_tagging_loss=0.008407, over 15862.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09219, pruned_loss=0.01288, audio_tagging_loss=0.008641, over 3043379.73 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:15:34,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3075020.0, ans=0.0 2023-11-27 13:16:02,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.774e+01 9.357e+01 9.883e+01 1.317e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:16:04,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3075220.0, ans=0.05 2023-11-27 13:16:18,137 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-27 13:16:23,504 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4400, loss[loss=0.08291, simple_loss=0.1122, pruned_loss=0.02097, audio_tagging_loss=0.005831, over 15541.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.09331, pruned_loss=0.01325, audio_tagging_loss=0.008502, over 3047289.24 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:16:24,743 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:16:28,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3075353.3333333335, ans=0.0 2023-11-27 13:17:05,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-27 13:17:08,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3075620.0, ans=0.125 2023-11-27 13:17:15,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-27 13:17:16,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3075620.0, ans=0.0 2023-11-27 13:17:21,464 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4450, loss[loss=0.06744, simple_loss=0.0929, pruned_loss=0.01309, audio_tagging_loss=0.007897, over 13226.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09285, pruned_loss=0.01305, audio_tagging_loss=0.008547, over 3049834.74 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:17:22,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-11-27 13:17:23,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3075686.6666666665, ans=0.0 2023-11-27 13:17:52,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3075820.0, ans=0.5 2023-11-27 13:17:58,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.640e+01 9.339e+01 1.014e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 13:18:09,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3075953.3333333335, ans=0.1 2023-11-27 13:18:14,576 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-27 13:18:20,233 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4500, loss[loss=0.08703, simple_loss=0.1244, pruned_loss=0.01825, audio_tagging_loss=0.00658, over 15627.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09277, pruned_loss=0.0132, audio_tagging_loss=0.008591, over 3055382.21 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:18:25,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-27 13:18:44,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3076153.3333333335, ans=0.125 2023-11-27 13:18:46,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3076153.3333333335, ans=0.125 2023-11-27 13:18:48,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3076153.3333333335, ans=0.125 2023-11-27 13:19:01,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-27 13:19:04,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3076220.0, ans=0.0 2023-11-27 13:19:06,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3076286.6666666665, ans=22.5 2023-11-27 13:19:11,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-27 13:19:17,598 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4550, loss[loss=0.06312, simple_loss=0.09551, pruned_loss=0.0083, audio_tagging_loss=0.007066, over 15036.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09179, pruned_loss=0.01294, audio_tagging_loss=0.008691, over 3055007.36 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:19:31,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-27 13:19:32,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3076420.0, ans=0.04949747468305833 2023-11-27 13:19:40,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-27 13:19:54,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.672e+01 9.431e+01 1.029e+02 1.211e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 13:20:04,914 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:20:09,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-27 13:20:14,520 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4600, loss[loss=0.07271, simple_loss=0.09939, pruned_loss=0.01272, audio_tagging_loss=0.0103, over 15053.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09113, pruned_loss=0.01282, audio_tagging_loss=0.008766, over 3059171.32 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:20:17,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3076686.6666666665, ans=0.2 2023-11-27 13:20:22,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3076686.6666666665, ans=0.125 2023-11-27 13:20:35,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2023-11-27 13:20:38,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-27 13:20:45,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-27 13:20:49,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3076886.6666666665, ans=0.125 2023-11-27 13:20:49,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3076886.6666666665, ans=0.125 2023-11-27 13:21:00,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3076953.3333333335, ans=0.125 2023-11-27 13:21:08,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-27 13:21:13,511 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4650, loss[loss=0.0644, simple_loss=0.09837, pruned_loss=0.008599, audio_tagging_loss=0.006617, over 15151.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09091, pruned_loss=0.01286, audio_tagging_loss=0.008954, over 3052799.15 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:21:18,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3077020.0, ans=0.0 2023-11-27 13:21:31,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-27 13:21:37,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3077153.3333333335, ans=0.125 2023-11-27 13:21:49,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.746e+01 9.217e+01 9.897e+01 1.196e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 13:22:04,841 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-27 13:22:10,672 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4700, loss[loss=0.06064, simple_loss=0.08002, pruned_loss=0.01092, audio_tagging_loss=0.00971, over 14612.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09092, pruned_loss=0.01292, audio_tagging_loss=0.008992, over 3053053.18 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:22:10,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3077353.3333333335, ans=0.2 2023-11-27 13:22:13,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3077353.3333333335, ans=0.125 2023-11-27 13:22:17,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3077353.3333333335, ans=0.0 2023-11-27 13:22:46,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3077553.3333333335, ans=0.125 2023-11-27 13:22:47,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3077553.3333333335, ans=0.0 2023-11-27 13:23:02,731 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-27 13:23:02,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3077620.0, ans=0.1 2023-11-27 13:23:08,071 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4750, loss[loss=0.03373, simple_loss=0.03592, pruned_loss=0.006188, audio_tagging_loss=0.009581, over 13874.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09065, pruned_loss=0.01289, audio_tagging_loss=0.009066, over 3054906.83 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:23:34,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-27 13:23:38,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3077820.0, ans=0.0 2023-11-27 13:23:38,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-27 13:23:39,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3077820.0, ans=0.1 2023-11-27 13:23:45,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.817e+01 9.459e+01 1.019e+02 1.212e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 13:23:46,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-27 13:23:48,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3077886.6666666665, ans=0.0 2023-11-27 13:24:00,646 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-27 13:24:01,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3077953.3333333335, ans=0.125 2023-11-27 13:24:06,647 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4800, loss[loss=0.07585, simple_loss=0.1097, pruned_loss=0.01311, audio_tagging_loss=0.007901, over 15934.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09116, pruned_loss=0.01283, audio_tagging_loss=0.00905, over 3056084.34 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:24:10,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3078020.0, ans=0.0 2023-11-27 13:24:14,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3078020.0, ans=0.015 2023-11-27 13:24:21,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3078086.6666666665, ans=0.125 2023-11-27 13:24:50,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-27 13:24:56,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3078286.6666666665, ans=0.1 2023-11-27 13:24:57,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-27 13:25:03,190 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4850, loss[loss=0.04935, simple_loss=0.05934, pruned_loss=0.009553, audio_tagging_loss=0.01012, over 15551.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08975, pruned_loss=0.01263, audio_tagging_loss=0.00915, over 3047561.09 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:25:19,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2023-11-27 13:25:26,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-27 13:25:36,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-27 13:25:40,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.650e+01 9.327e+01 1.023e+02 1.195e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 13:25:53,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3078620.0, ans=0.125 2023-11-27 13:25:54,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-27 13:26:00,785 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4900, loss[loss=0.05526, simple_loss=0.08306, pruned_loss=0.007184, audio_tagging_loss=0.006548, over 16953.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09091, pruned_loss=0.01283, audio_tagging_loss=0.009039, over 3050647.14 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:26:13,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3078753.3333333335, ans=0.125 2023-11-27 13:26:29,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3078820.0, ans=0.125 2023-11-27 13:26:52,501 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-27 13:26:52,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2023-11-27 13:26:55,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.47 vs. limit=6.0 2023-11-27 13:26:57,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3079020.0, ans=0.125 2023-11-27 13:26:58,534 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4950, loss[loss=0.07035, simple_loss=0.09422, pruned_loss=0.01163, audio_tagging_loss=0.0116, over 15307.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09068, pruned_loss=0.0128, audio_tagging_loss=0.008923, over 3041081.51 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:26:58,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3079020.0, ans=0.125 2023-11-27 13:27:08,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3079020.0, ans=0.125 2023-11-27 13:27:20,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3079153.3333333335, ans=0.125 2023-11-27 13:27:24,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3079153.3333333335, ans=0.125 2023-11-27 13:27:34,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.478e+01 9.080e+01 9.742e+01 1.240e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:27:39,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3079220.0, ans=0.2 2023-11-27 13:27:39,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3079220.0, ans=0.125 2023-11-27 13:27:50,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-27 13:27:52,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-27 13:27:52,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-27 13:27:55,869 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5000, loss[loss=0.07515, simple_loss=0.1073, pruned_loss=0.01442, audio_tagging_loss=0.007074, over 17066.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08944, pruned_loss=0.01254, audio_tagging_loss=0.008853, over 3047633.84 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:27:58,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3079353.3333333335, ans=0.0 2023-11-27 13:28:04,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3079353.3333333335, ans=0.035 2023-11-27 13:28:06,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.83 vs. limit=22.5 2023-11-27 13:28:07,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3079420.0, ans=0.0 2023-11-27 13:28:10,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3079420.0, ans=0.0 2023-11-27 13:28:43,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3079620.0, ans=0.1 2023-11-27 13:28:45,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3079620.0, ans=0.1 2023-11-27 13:28:45,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-27 13:28:47,586 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-27 13:28:52,927 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5050, loss[loss=0.05236, simple_loss=0.07357, pruned_loss=0.00794, audio_tagging_loss=0.007633, over 14936.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09002, pruned_loss=0.01269, audio_tagging_loss=0.008747, over 3042550.91 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:28:55,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2023-11-27 13:28:56,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.54 vs. limit=10.0 2023-11-27 13:28:58,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-11-27 13:29:18,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3079820.0, ans=0.0 2023-11-27 13:29:25,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3079820.0, ans=0.025 2023-11-27 13:29:26,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3079886.6666666665, ans=0.0 2023-11-27 13:29:29,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.853e+01 9.456e+01 1.016e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 13:29:29,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3079886.6666666665, ans=0.125 2023-11-27 13:29:36,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3079886.6666666665, ans=0.125 2023-11-27 13:29:37,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-27 13:29:38,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=22.5 2023-11-27 13:29:43,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-27 13:29:50,852 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5100, loss[loss=0.05422, simple_loss=0.06898, pruned_loss=0.008852, audio_tagging_loss=0.01088, over 14627.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08939, pruned_loss=0.01254, audio_tagging_loss=0.008748, over 3041559.78 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:29:51,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3080020.0, ans=0.1 2023-11-27 13:30:03,707 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:30:05,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3080086.6666666665, ans=0.1 2023-11-27 13:30:11,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3080086.6666666665, ans=0.125 2023-11-27 13:30:23,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3080220.0, ans=0.125 2023-11-27 13:30:42,759 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-27 13:30:48,282 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5150, loss[loss=0.06723, simple_loss=0.09305, pruned_loss=0.01133, audio_tagging_loss=0.009366, over 15647.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08995, pruned_loss=0.01262, audio_tagging_loss=0.008627, over 3041857.78 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:31:06,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3080420.0, ans=0.125 2023-11-27 13:31:25,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3080553.3333333335, ans=0.1 2023-11-27 13:31:26,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.502e+01 9.223e+01 9.833e+01 1.321e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 13:31:28,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-27 13:31:28,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-27 13:31:39,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-27 13:31:42,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3080620.0, ans=0.0 2023-11-27 13:31:45,303 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5200, loss[loss=0.06353, simple_loss=0.08662, pruned_loss=0.01324, audio_tagging_loss=0.006979, over 15181.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09065, pruned_loss=0.01265, audio_tagging_loss=0.008591, over 3046431.65 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:31:48,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3080686.6666666665, ans=0.1 2023-11-27 13:31:55,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3080753.3333333335, ans=0.0 2023-11-27 13:31:57,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3080753.3333333335, ans=0.5 2023-11-27 13:32:00,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3080753.3333333335, ans=0.0 2023-11-27 13:32:14,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3080820.0, ans=0.125 2023-11-27 13:32:36,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-27 13:32:42,054 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5250, loss[loss=0.08288, simple_loss=0.1162, pruned_loss=0.01786, audio_tagging_loss=0.006944, over 13940.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09051, pruned_loss=0.01264, audio_tagging_loss=0.008643, over 3041115.74 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:32:46,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3081020.0, ans=0.125 2023-11-27 13:32:51,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3081020.0, ans=0.1 2023-11-27 13:32:55,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3081086.6666666665, ans=0.0 2023-11-27 13:33:13,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3081153.3333333335, ans=0.125 2023-11-27 13:33:20,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.787e+01 9.211e+01 1.001e+02 1.224e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:33:20,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-27 13:33:34,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2023-11-27 13:33:34,938 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-27 13:33:40,670 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5300, loss[loss=0.05934, simple_loss=0.08436, pruned_loss=0.01032, audio_tagging_loss=0.006834, over 14545.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09024, pruned_loss=0.01262, audio_tagging_loss=0.008702, over 3046472.65 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:33:49,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3081353.3333333335, ans=0.2 2023-11-27 13:34:20,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3081553.3333333335, ans=0.2 2023-11-27 13:34:29,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3081620.0, ans=0.1 2023-11-27 13:34:30,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3081620.0, ans=0.125 2023-11-27 13:34:32,338 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-27 13:34:37,735 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5350, loss[loss=0.06619, simple_loss=0.0868, pruned_loss=0.0146, audio_tagging_loss=0.008187, over 15037.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08935, pruned_loss=0.0125, audio_tagging_loss=0.008682, over 3047326.01 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:34:44,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3081686.6666666665, ans=0.125 2023-11-27 13:34:52,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3081753.3333333335, ans=0.125 2023-11-27 13:34:56,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3081753.3333333335, ans=0.125 2023-11-27 13:34:57,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3081753.3333333335, ans=0.0 2023-11-27 13:35:12,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3081886.6666666665, ans=22.5 2023-11-27 13:35:12,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3081886.6666666665, ans=0.1 2023-11-27 13:35:16,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.714e+01 9.357e+01 9.937e+01 1.176e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:35:18,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2023-11-27 13:35:29,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-27 13:35:29,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-27 13:35:34,980 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5400, loss[loss=0.05818, simple_loss=0.07492, pruned_loss=0.009835, audio_tagging_loss=0.01088, over 15776.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09051, pruned_loss=0.01265, audio_tagging_loss=0.00872, over 3044843.27 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:35:40,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3082020.0, ans=0.0 2023-11-27 13:35:41,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3082020.0, ans=0.0 2023-11-27 13:35:54,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3082086.6666666665, ans=0.125 2023-11-27 13:35:55,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3082086.6666666665, ans=0.2 2023-11-27 13:36:02,362 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:36:27,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-27 13:36:33,155 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5450, loss[loss=0.06189, simple_loss=0.08441, pruned_loss=0.01203, audio_tagging_loss=0.007656, over 15334.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09093, pruned_loss=0.0128, audio_tagging_loss=0.00868, over 3046834.27 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:36:41,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3082353.3333333335, ans=10.0 2023-11-27 13:37:02,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3082486.6666666665, ans=0.1 2023-11-27 13:37:11,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3082553.3333333335, ans=15.0 2023-11-27 13:37:11,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3082553.3333333335, ans=0.1 2023-11-27 13:37:12,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-27 13:37:12,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.623e+01 9.313e+01 1.025e+02 1.327e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:37:15,177 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:37:24,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3082620.0, ans=0.125 2023-11-27 13:37:25,515 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-27 13:37:31,088 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5500, loss[loss=0.06541, simple_loss=0.08978, pruned_loss=0.01185, audio_tagging_loss=0.00867, over 14652.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09105, pruned_loss=0.0128, audio_tagging_loss=0.008763, over 3049980.02 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:37:41,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3082753.3333333335, ans=0.0 2023-11-27 13:37:45,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2023-11-27 13:37:56,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3082820.0, ans=0.125 2023-11-27 13:38:18,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3082953.3333333335, ans=0.2 2023-11-27 13:38:22,992 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-27 13:38:28,421 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5550, loss[loss=0.06239, simple_loss=0.08896, pruned_loss=0.00976, audio_tagging_loss=0.00815, over 15487.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09066, pruned_loss=0.01255, audio_tagging_loss=0.0088, over 3047292.34 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:38:31,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3083020.0, ans=0.125 2023-11-27 13:38:31,341 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:38:31,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3083020.0, ans=0.125 2023-11-27 13:39:06,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-11-27 13:39:09,186 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.682e+01 9.087e+01 1.004e+02 1.719e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 13:39:15,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3083286.6666666665, ans=0.125 2023-11-27 13:39:21,213 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-27 13:39:27,223 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5600, loss[loss=0.0574, simple_loss=0.07431, pruned_loss=0.009793, audio_tagging_loss=0.01045, over 14788.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.091, pruned_loss=0.01268, audio_tagging_loss=0.008961, over 3044606.76 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:39:32,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-27 13:39:42,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3083420.0, ans=0.0 2023-11-27 13:39:45,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3083420.0, ans=0.1 2023-11-27 13:40:12,340 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:40:19,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-27 13:40:25,092 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5650, loss[loss=0.07999, simple_loss=0.1054, pruned_loss=0.01613, audio_tagging_loss=0.01115, over 15742.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09155, pruned_loss=0.01274, audio_tagging_loss=0.009046, over 3045931.03 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:40:29,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3083686.6666666665, ans=0.125 2023-11-27 13:40:30,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3083686.6666666665, ans=0.125 2023-11-27 13:40:57,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3083820.0, ans=0.1 2023-11-27 13:41:03,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-27 13:41:05,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.523e+01 8.985e+01 9.871e+01 1.258e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 13:41:14,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3083953.3333333335, ans=0.125 2023-11-27 13:41:16,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-27 13:41:22,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=22.5 2023-11-27 13:41:22,697 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5700, loss[loss=0.04557, simple_loss=0.05931, pruned_loss=0.004975, audio_tagging_loss=0.01094, over 15724.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09092, pruned_loss=0.01266, audio_tagging_loss=0.009022, over 3045003.71 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:41:42,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-27 13:41:43,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-27 13:41:49,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-27 13:42:14,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-27 13:42:21,536 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5750, loss[loss=0.05993, simple_loss=0.07762, pruned_loss=0.01058, audio_tagging_loss=0.01054, over 14329.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09108, pruned_loss=0.01278, audio_tagging_loss=0.008982, over 3050586.70 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:42:26,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3084353.3333333335, ans=0.0 2023-11-27 13:42:47,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-27 13:43:01,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.392e+01 9.189e+01 1.014e+02 1.266e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 13:43:06,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3084553.3333333335, ans=0.2 2023-11-27 13:43:13,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-27 13:43:18,774 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5800, loss[loss=0.05363, simple_loss=0.07821, pruned_loss=0.008147, audio_tagging_loss=0.006379, over 14772.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09175, pruned_loss=0.01288, audio_tagging_loss=0.008768, over 3048666.00 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:43:21,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3084686.6666666665, ans=0.0 2023-11-27 13:43:26,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3084686.6666666665, ans=0.125 2023-11-27 13:43:31,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3084753.3333333335, ans=0.125 2023-11-27 13:43:31,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2023-11-27 13:43:41,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3084820.0, ans=0.125 2023-11-27 13:44:01,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3084886.6666666665, ans=0.125 2023-11-27 13:44:05,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3084953.3333333335, ans=0.125 2023-11-27 13:44:07,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3084953.3333333335, ans=0.1 2023-11-27 13:44:11,173 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-27 13:44:16,473 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5850, loss[loss=0.07469, simple_loss=0.09168, pruned_loss=0.01922, audio_tagging_loss=0.009624, over 14886.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09057, pruned_loss=0.01273, audio_tagging_loss=0.008672, over 3046554.16 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:44:18,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3085020.0, ans=0.0 2023-11-27 13:44:33,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3085086.6666666665, ans=0.125 2023-11-27 13:44:38,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3085086.6666666665, ans=0.0 2023-11-27 13:44:54,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085220.0, ans=0.1 2023-11-27 13:44:57,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.603e+01 9.114e+01 9.888e+01 1.396e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 13:44:59,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3085220.0, ans=0.0 2023-11-27 13:45:02,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3085286.6666666665, ans=0.125 2023-11-27 13:45:03,935 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:45:08,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-27 13:45:14,805 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5900, loss[loss=0.06108, simple_loss=0.08313, pruned_loss=0.01111, audio_tagging_loss=0.008406, over 14908.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09014, pruned_loss=0.01256, audio_tagging_loss=0.008681, over 3037772.74 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:45:21,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3085353.3333333335, ans=0.0 2023-11-27 13:45:31,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3085420.0, ans=0.2 2023-11-27 13:45:54,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3085553.3333333335, ans=0.125 2023-11-27 13:46:07,464 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-27 13:46:12,867 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5950, loss[loss=0.06202, simple_loss=0.08704, pruned_loss=0.01109, audio_tagging_loss=0.0074, over 16343.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09058, pruned_loss=0.01259, audio_tagging_loss=0.00865, over 3045056.59 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:46:15,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3085686.6666666665, ans=0.0 2023-11-27 13:46:38,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2023-11-27 13:46:53,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2023-11-27 13:46:53,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.517e+01 9.163e+01 1.018e+02 1.224e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:46:55,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3085886.6666666665, ans=0.125 2023-11-27 13:46:55,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3085886.6666666665, ans=0.2 2023-11-27 13:47:04,813 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-27 13:47:10,176 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6000, loss[loss=0.07075, simple_loss=0.1003, pruned_loss=0.01145, audio_tagging_loss=0.009131, over 15242.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09048, pruned_loss=0.01278, audio_tagging_loss=0.008632, over 3041510.58 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:47:10,177 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 13:47:31,766 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4672, 3.7983, 4.3317, 3.4027], device='cuda:3') 2023-11-27 13:47:44,798 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05076, pruned_loss=0.005225, audio_tagging_loss=0.02706, over 4681554.00 frames. 2023-11-27 13:47:44,799 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 13:47:49,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3086020.0, ans=0.2 2023-11-27 13:48:29,909 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:48:35,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3086286.6666666665, ans=0.125 2023-11-27 13:48:36,709 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-27 13:48:42,316 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6050, loss[loss=0.04988, simple_loss=0.06462, pruned_loss=0.007789, audio_tagging_loss=0.009783, over 15260.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08975, pruned_loss=0.01268, audio_tagging_loss=0.008639, over 3047171.24 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:48:42,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3086353.3333333335, ans=0.2 2023-11-27 13:49:16,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3086553.3333333335, ans=0.0 2023-11-27 13:49:24,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.673e+01 9.372e+01 1.019e+02 1.327e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 13:49:34,250 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-27 13:49:36,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3086620.0, ans=0.125 2023-11-27 13:49:36,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3086620.0, ans=0.0 2023-11-27 13:49:40,138 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6100, loss[loss=0.0604, simple_loss=0.07619, pruned_loss=0.01211, audio_tagging_loss=0.0102, over 15116.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08928, pruned_loss=0.01265, audio_tagging_loss=0.00868, over 3044188.23 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:49:42,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-27 13:49:48,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-11-27 13:49:51,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2023-11-27 13:49:59,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086753.3333333335, ans=0.1 2023-11-27 13:50:20,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3086886.6666666665, ans=0.1 2023-11-27 13:50:31,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3086953.3333333335, ans=0.0 2023-11-27 13:50:32,434 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-27 13:50:38,826 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6150, loss[loss=0.05823, simple_loss=0.0769, pruned_loss=0.0117, audio_tagging_loss=0.008081, over 14535.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08916, pruned_loss=0.01277, audio_tagging_loss=0.008715, over 3039408.15 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:50:53,054 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:50:59,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3087086.6666666665, ans=0.09899494936611666 2023-11-27 13:51:04,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3087153.3333333335, ans=0.125 2023-11-27 13:51:09,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087153.3333333335, ans=0.1 2023-11-27 13:51:20,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.580e+01 9.298e+01 1.013e+02 1.298e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 13:51:24,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3087286.6666666665, ans=0.0 2023-11-27 13:51:30,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3087286.6666666665, ans=0.2 2023-11-27 13:51:31,249 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-27 13:51:36,705 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6200, loss[loss=0.05846, simple_loss=0.07378, pruned_loss=0.009821, audio_tagging_loss=0.01175, over 15390.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08825, pruned_loss=0.01259, audio_tagging_loss=0.008877, over 3033130.69 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:51:59,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087486.6666666665, ans=0.1 2023-11-27 13:52:01,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3087486.6666666665, ans=0.0 2023-11-27 13:52:13,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3087553.3333333335, ans=0.2 2023-11-27 13:52:16,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3087553.3333333335, ans=0.125 2023-11-27 13:52:28,808 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-27 13:52:34,173 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6250, loss[loss=0.08935, simple_loss=0.1222, pruned_loss=0.01951, audio_tagging_loss=0.008729, over 15170.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08899, pruned_loss=0.01263, audio_tagging_loss=0.009002, over 3043350.69 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:52:37,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3087686.6666666665, ans=0.125 2023-11-27 13:52:51,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-27 13:52:54,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-27 13:52:59,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3087820.0, ans=0.125 2023-11-27 13:53:10,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3087886.6666666665, ans=0.125 2023-11-27 13:53:16,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.648e+01 9.152e+01 1.003e+02 1.287e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 13:53:17,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087886.6666666665, ans=0.1 2023-11-27 13:53:26,224 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-27 13:53:32,901 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6300, loss[loss=0.04911, simple_loss=0.07119, pruned_loss=0.005935, audio_tagging_loss=0.007578, over 16659.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08919, pruned_loss=0.01269, audio_tagging_loss=0.009098, over 3051706.07 frames. ], batch size: 65, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:53:45,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3088086.6666666665, ans=0.2 2023-11-27 13:54:01,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3088153.3333333335, ans=0.1 2023-11-27 13:54:03,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-27 13:54:25,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-27 13:54:31,600 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6350, loss[loss=0.07776, simple_loss=0.1113, pruned_loss=0.01427, audio_tagging_loss=0.007853, over 15938.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08806, pruned_loss=0.01239, audio_tagging_loss=0.009175, over 3052649.79 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:54:36,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3088353.3333333335, ans=0.0 2023-11-27 13:54:40,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3088353.3333333335, ans=0.125 2023-11-27 13:54:40,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3088353.3333333335, ans=0.1 2023-11-27 13:55:00,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3088486.6666666665, ans=0.07 2023-11-27 13:55:13,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.528e+01 9.081e+01 9.909e+01 1.480e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:55:13,443 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:55:23,346 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-27 13:55:25,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3088620.0, ans=0.2 2023-11-27 13:55:28,791 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6400, loss[loss=0.05244, simple_loss=0.07008, pruned_loss=0.009409, audio_tagging_loss=0.007997, over 14795.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08866, pruned_loss=0.01242, audio_tagging_loss=0.009183, over 3052409.51 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:55:43,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-27 13:55:43,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3088753.3333333335, ans=0.0 2023-11-27 13:55:56,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3088820.0, ans=0.0 2023-11-27 13:56:07,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3088886.6666666665, ans=0.0 2023-11-27 13:56:13,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-27 13:56:16,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3088953.3333333335, ans=0.125 2023-11-27 13:56:20,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-27 13:56:25,844 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6450, loss[loss=0.0551, simple_loss=0.07559, pruned_loss=0.00952, audio_tagging_loss=0.007785, over 14572.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08981, pruned_loss=0.01242, audio_tagging_loss=0.009127, over 3043933.43 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:56:26,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3089020.0, ans=0.1 2023-11-27 13:56:38,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3089086.6666666665, ans=0.0 2023-11-27 13:56:41,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3089086.6666666665, ans=0.125 2023-11-27 13:56:41,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3089086.6666666665, ans=0.125 2023-11-27 13:57:04,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3089220.0, ans=0.125 2023-11-27 13:57:07,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.614e+01 9.257e+01 9.847e+01 1.533e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 13:57:09,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3089220.0, ans=0.0 2023-11-27 13:57:15,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3089286.6666666665, ans=0.1 2023-11-27 13:57:19,459 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-27 13:57:19,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3089286.6666666665, ans=0.125 2023-11-27 13:57:25,816 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6500, loss[loss=0.07378, simple_loss=0.102, pruned_loss=0.0146, audio_tagging_loss=0.008188, over 14592.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09081, pruned_loss=0.01258, audio_tagging_loss=0.008996, over 3045521.41 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:57:27,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3089353.3333333335, ans=0.125 2023-11-27 13:57:41,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-27 13:57:56,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3089486.6666666665, ans=0.125 2023-11-27 13:58:04,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3089553.3333333335, ans=0.04949747468305833 2023-11-27 13:58:12,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3089620.0, ans=0.125 2023-11-27 13:58:16,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3089620.0, ans=0.0 2023-11-27 13:58:17,159 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-27 13:58:18,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3089620.0, ans=0.125 2023-11-27 13:58:22,704 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6550, loss[loss=0.07252, simple_loss=0.1029, pruned_loss=0.01248, audio_tagging_loss=0.008591, over 14453.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09181, pruned_loss=0.01272, audio_tagging_loss=0.008836, over 3046878.88 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:58:28,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3089686.6666666665, ans=0.1 2023-11-27 13:59:04,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.565e+01 9.161e+01 9.963e+01 1.311e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 13:59:08,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=22.5 2023-11-27 13:59:14,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-27 13:59:19,697 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6600, loss[loss=0.06363, simple_loss=0.09299, pruned_loss=0.01023, audio_tagging_loss=0.006905, over 15584.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09076, pruned_loss=0.01255, audio_tagging_loss=0.008708, over 3045006.32 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:59:27,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-27 13:59:55,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3090220.0, ans=0.0 2023-11-27 13:59:58,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.07 vs. limit=10.0 2023-11-27 14:00:11,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-27 14:00:17,669 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6650, loss[loss=0.07249, simple_loss=0.09092, pruned_loss=0.01928, audio_tagging_loss=0.007758, over 14166.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09149, pruned_loss=0.0129, audio_tagging_loss=0.008626, over 3042015.37 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:00:39,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3090486.6666666665, ans=0.125 2023-11-27 14:00:40,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3090486.6666666665, ans=0.0 2023-11-27 14:00:47,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2023-11-27 14:00:54,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-11-27 14:00:54,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-27 14:00:58,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.791e+01 9.442e+01 1.009e+02 1.378e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:01:09,397 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-27 14:01:13,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3090620.0, ans=10.0 2023-11-27 14:01:15,149 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6700, loss[loss=0.05684, simple_loss=0.06551, pruned_loss=0.01159, audio_tagging_loss=0.01249, over 15575.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09111, pruned_loss=0.01287, audio_tagging_loss=0.008642, over 3042490.58 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:01:35,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3090753.3333333335, ans=0.02 2023-11-27 14:02:06,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-27 14:02:11,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3091020.0, ans=0.0 2023-11-27 14:02:12,088 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6750, loss[loss=0.07618, simple_loss=0.0987, pruned_loss=0.01565, audio_tagging_loss=0.01118, over 15599.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09028, pruned_loss=0.01282, audio_tagging_loss=0.008723, over 3044287.51 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:02:32,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3091086.6666666665, ans=0.125 2023-11-27 14:02:33,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3091086.6666666665, ans=0.125 2023-11-27 14:02:36,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-27 14:02:41,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-27 14:02:41,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-27 14:02:53,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.433e+01 9.032e+01 9.783e+01 1.125e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:03:03,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-27 14:03:10,135 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6800, loss[loss=0.07751, simple_loss=0.107, pruned_loss=0.01722, audio_tagging_loss=0.006761, over 15436.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.0907, pruned_loss=0.01295, audio_tagging_loss=0.008719, over 3039751.69 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:03:10,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3091353.3333333335, ans=0.125 2023-11-27 14:03:26,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091420.0, ans=0.1 2023-11-27 14:03:41,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3091486.6666666665, ans=0.125 2023-11-27 14:03:56,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3091620.0, ans=0.125 2023-11-27 14:04:01,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-27 14:04:05,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3091686.6666666665, ans=0.2 2023-11-27 14:04:06,894 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6850, loss[loss=0.06532, simple_loss=0.09619, pruned_loss=0.009702, audio_tagging_loss=0.007521, over 14913.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.0915, pruned_loss=0.01292, audio_tagging_loss=0.008701, over 3042530.45 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:04:15,513 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:04:18,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-11-27 14:04:18,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3091753.3333333335, ans=0.125 2023-11-27 14:04:20,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3091753.3333333335, ans=0.125 2023-11-27 14:04:24,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3091753.3333333335, ans=0.2 2023-11-27 14:04:36,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3091820.0, ans=0.0 2023-11-27 14:04:43,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3091886.6666666665, ans=0.04949747468305833 2023-11-27 14:04:44,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-27 14:04:49,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.738e+01 9.106e+01 9.965e+01 1.501e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:04:57,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-11-27 14:04:59,332 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-27 14:05:05,180 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6900, loss[loss=0.07704, simple_loss=0.1034, pruned_loss=0.01805, audio_tagging_loss=0.007298, over 14595.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09188, pruned_loss=0.01295, audio_tagging_loss=0.008568, over 3047490.05 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:05:14,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3092020.0, ans=0.1 2023-11-27 14:05:16,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3092086.6666666665, ans=0.0 2023-11-27 14:05:30,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3092153.3333333335, ans=0.0 2023-11-27 14:05:31,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=12.0 2023-11-27 14:05:36,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-27 14:05:43,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3092220.0, ans=0.125 2023-11-27 14:05:51,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3092286.6666666665, ans=0.125 2023-11-27 14:05:53,787 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:05:57,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-27 14:06:03,940 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6950, loss[loss=0.05288, simple_loss=0.06472, pruned_loss=0.007474, audio_tagging_loss=0.01305, over 13686.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09184, pruned_loss=0.01283, audio_tagging_loss=0.008609, over 3048456.37 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:06:20,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3092420.0, ans=0.2 2023-11-27 14:06:23,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3092420.0, ans=0.125 2023-11-27 14:06:31,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3092486.6666666665, ans=0.125 2023-11-27 14:06:35,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3092486.6666666665, ans=0.5 2023-11-27 14:06:39,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3092553.3333333335, ans=0.125 2023-11-27 14:06:42,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3092553.3333333335, ans=0.125 2023-11-27 14:06:46,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.400e+01 9.204e+01 9.755e+01 1.289e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-27 14:06:54,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3092620.0, ans=0.125 2023-11-27 14:06:55,665 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-27 14:06:58,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-27 14:07:01,083 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7000, loss[loss=0.1025, simple_loss=0.1261, pruned_loss=0.03207, audio_tagging_loss=0.007374, over 15226.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09199, pruned_loss=0.013, audio_tagging_loss=0.00861, over 3047645.54 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:07:08,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2023-11-27 14:07:23,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.96 vs. limit=10.0 2023-11-27 14:07:51,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092953.3333333335, ans=0.1 2023-11-27 14:07:52,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-27 14:07:58,830 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7050, loss[loss=0.05723, simple_loss=0.08358, pruned_loss=0.008706, audio_tagging_loss=0.00673, over 17198.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09085, pruned_loss=0.01276, audio_tagging_loss=0.008692, over 3053088.95 frames. ], batch size: 66, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:08:09,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3093086.6666666665, ans=0.02 2023-11-27 14:08:16,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-27 14:08:17,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3093086.6666666665, ans=0.125 2023-11-27 14:08:41,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.528e+01 9.043e+01 9.552e+01 1.279e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 14:08:50,147 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-27 14:08:58,744 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7100, loss[loss=0.07476, simple_loss=0.1009, pruned_loss=0.01553, audio_tagging_loss=0.008794, over 15132.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09129, pruned_loss=0.0128, audio_tagging_loss=0.00876, over 3058969.76 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:09:19,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-27 14:09:24,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3093486.6666666665, ans=0.0 2023-11-27 14:09:28,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3093486.6666666665, ans=0.125 2023-11-27 14:09:36,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2023-11-27 14:09:50,868 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-27 14:09:56,361 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7150, loss[loss=0.06305, simple_loss=0.08856, pruned_loss=0.01021, audio_tagging_loss=0.008563, over 14819.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09105, pruned_loss=0.01274, audio_tagging_loss=0.008799, over 3064631.41 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:10:21,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-11-27 14:10:35,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-27 14:10:39,817 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.709e+01 9.080e+01 1.002e+02 1.169e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 14:10:43,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3093953.3333333335, ans=0.0 2023-11-27 14:10:47,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-27 14:10:53,056 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7200, loss[loss=0.06708, simple_loss=0.08833, pruned_loss=0.01169, audio_tagging_loss=0.01122, over 15525.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09121, pruned_loss=0.01287, audio_tagging_loss=0.008838, over 3060402.83 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:10:58,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094020.0, ans=0.1 2023-11-27 14:11:41,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3094286.6666666665, ans=0.125 2023-11-27 14:11:45,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-27 14:11:50,682 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7250, loss[loss=0.05308, simple_loss=0.06423, pruned_loss=0.008536, audio_tagging_loss=0.01243, over 14949.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09144, pruned_loss=0.0129, audio_tagging_loss=0.008937, over 3049767.35 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:11:59,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3094353.3333333335, ans=0.1 2023-11-27 14:12:01,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3094420.0, ans=0.125 2023-11-27 14:12:20,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3094486.6666666665, ans=0.125 2023-11-27 14:12:27,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3094553.3333333335, ans=0.2 2023-11-27 14:12:33,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3094553.3333333335, ans=0.0 2023-11-27 14:12:34,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.560e+01 9.107e+01 9.786e+01 1.290e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:12:43,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-27 14:12:49,026 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7300, loss[loss=0.08121, simple_loss=0.1125, pruned_loss=0.01755, audio_tagging_loss=0.007425, over 16753.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09139, pruned_loss=0.0128, audio_tagging_loss=0.008823, over 3051066.41 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:12:57,974 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:13:11,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094820.0, ans=0.1 2023-11-27 14:13:15,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3094820.0, ans=0.05 2023-11-27 14:13:39,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3094953.3333333335, ans=0.0 2023-11-27 14:13:40,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-27 14:13:45,833 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7350, loss[loss=0.05715, simple_loss=0.0846, pruned_loss=0.007307, audio_tagging_loss=0.007542, over 14823.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09058, pruned_loss=0.01271, audio_tagging_loss=0.008722, over 3049628.72 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:49,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3095020.0, ans=0.09899494936611666 2023-11-27 14:13:49,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3095020.0, ans=0.1 2023-11-27 14:13:55,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3095020.0, ans=0.1 2023-11-27 14:14:00,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:00,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:04,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2023-11-27 14:14:29,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.691e+01 9.417e+01 9.998e+01 1.354e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 14:14:37,797 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-27 14:14:43,852 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7400, loss[loss=0.06687, simple_loss=0.08931, pruned_loss=0.01204, audio_tagging_loss=0.01018, over 14305.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09072, pruned_loss=0.01273, audio_tagging_loss=0.00865, over 3045721.28 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:14:44,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-27 14:15:09,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3095486.6666666665, ans=0.0 2023-11-27 14:15:18,546 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:15:25,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3095553.3333333335, ans=0.125 2023-11-27 14:15:36,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-27 14:15:42,170 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7450, loss[loss=0.07279, simple_loss=0.1021, pruned_loss=0.01561, audio_tagging_loss=0.006145, over 16216.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08998, pruned_loss=0.01255, audio_tagging_loss=0.008672, over 3043524.09 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:15:53,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3095753.3333333335, ans=0.95 2023-11-27 14:15:58,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3095753.3333333335, ans=0.125 2023-11-27 14:16:05,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=22.5 2023-11-27 14:16:19,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3095886.6666666665, ans=0.125 2023-11-27 14:16:25,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.639e+01 9.279e+01 9.819e+01 1.205e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 14:16:31,539 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:16:31,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3095953.3333333335, ans=0.07 2023-11-27 14:16:32,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2023-11-27 14:16:33,594 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-27 14:16:39,345 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7500, loss[loss=0.04148, simple_loss=0.05427, pruned_loss=0.004164, audio_tagging_loss=0.01018, over 14815.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09074, pruned_loss=0.01275, audio_tagging_loss=0.008613, over 3050249.31 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:16:40,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-27 14:16:42,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3096020.0, ans=0.09899494936611666 2023-11-27 14:16:49,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3096086.6666666665, ans=0.125 2023-11-27 14:16:52,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096086.6666666665, ans=0.1 2023-11-27 14:17:09,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096153.3333333335, ans=0.1 2023-11-27 14:17:15,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3096220.0, ans=0.125 2023-11-27 14:17:23,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-11-27 14:17:31,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-27 14:17:34,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3096286.6666666665, ans=0.125 2023-11-27 14:17:36,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3096353.3333333335, ans=0.2 2023-11-27 14:17:37,443 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7550, loss[loss=0.04714, simple_loss=0.0665, pruned_loss=0.006339, audio_tagging_loss=0.007547, over 15471.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08957, pruned_loss=0.01264, audio_tagging_loss=0.008601, over 3046773.96 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:17:42,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3096353.3333333335, ans=0.0 2023-11-27 14:18:20,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=22.5 2023-11-27 14:18:22,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.787e+01 9.439e+01 1.010e+02 1.313e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:18:24,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3096620.0, ans=0.125 2023-11-27 14:18:27,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3096620.0, ans=0.125 2023-11-27 14:18:31,257 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-27 14:18:37,277 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7600, loss[loss=0.04235, simple_loss=0.05525, pruned_loss=0.004966, audio_tagging_loss=0.009763, over 15560.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08867, pruned_loss=0.01268, audio_tagging_loss=0.008768, over 3050445.92 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:18:46,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3096686.6666666665, ans=0.125 2023-11-27 14:19:14,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3096886.6666666665, ans=0.0 2023-11-27 14:19:23,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3096953.3333333335, ans=0.2 2023-11-27 14:19:28,813 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-27 14:19:34,219 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7650, loss[loss=0.06535, simple_loss=0.08709, pruned_loss=0.01341, audio_tagging_loss=0.0084, over 14758.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08917, pruned_loss=0.01274, audio_tagging_loss=0.008766, over 3049608.39 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:19:40,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3097020.0, ans=0.125 2023-11-27 14:19:51,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3097086.6666666665, ans=0.125 2023-11-27 14:20:11,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3097220.0, ans=0.125 2023-11-27 14:20:16,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3097220.0, ans=0.0 2023-11-27 14:20:18,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.470e+01 8.990e+01 9.726e+01 1.372e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-27 14:20:25,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-27 14:20:31,203 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7700, loss[loss=0.07423, simple_loss=0.09821, pruned_loss=0.01709, audio_tagging_loss=0.008032, over 13368.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08888, pruned_loss=0.01277, audio_tagging_loss=0.00883, over 3039590.04 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:20:48,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3097420.0, ans=0.0 2023-11-27 14:20:49,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-27 14:20:50,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3097420.0, ans=0.0 2023-11-27 14:20:52,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3097420.0, ans=0.035 2023-11-27 14:20:52,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-27 14:21:05,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097553.3333333335, ans=0.1 2023-11-27 14:21:06,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3097553.3333333335, ans=0.125 2023-11-27 14:21:14,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3097553.3333333335, ans=0.125 2023-11-27 14:21:23,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-27 14:21:28,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3097620.0, ans=0.125 2023-11-27 14:21:30,605 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7750, loss[loss=0.08748, simple_loss=0.1137, pruned_loss=0.01863, audio_tagging_loss=0.01199, over 15661.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08977, pruned_loss=0.01279, audio_tagging_loss=0.008848, over 3043874.41 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:21:52,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3097820.0, ans=0.0 2023-11-27 14:22:01,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3097820.0, ans=0.125 2023-11-27 14:22:15,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.645e+01 9.369e+01 1.003e+02 1.399e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 14:22:21,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097953.3333333335, ans=0.1 2023-11-27 14:22:22,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-27 14:22:27,485 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7800, loss[loss=0.04497, simple_loss=0.06018, pruned_loss=0.006661, audio_tagging_loss=0.008213, over 14196.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08988, pruned_loss=0.01268, audio_tagging_loss=0.008722, over 3036942.40 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:22:28,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3098020.0, ans=0.0 2023-11-27 14:22:43,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-27 14:23:03,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3098220.0, ans=0.125 2023-11-27 14:23:09,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3098220.0, ans=0.125 2023-11-27 14:23:12,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3098286.6666666665, ans=0.0 2023-11-27 14:23:15,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-11-27 14:23:19,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-27 14:23:24,830 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7850, loss[loss=0.08511, simple_loss=0.116, pruned_loss=0.02011, audio_tagging_loss=0.007005, over 15288.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08981, pruned_loss=0.01276, audio_tagging_loss=0.008779, over 3035321.45 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:23:25,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3098353.3333333335, ans=0.125 2023-11-27 14:23:31,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3098353.3333333335, ans=0.0 2023-11-27 14:24:08,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3098553.3333333335, ans=0.1 2023-11-27 14:24:10,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.600e+01 9.119e+01 9.772e+01 1.362e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 14:24:17,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-27 14:24:18,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3098620.0, ans=0.125 2023-11-27 14:24:24,264 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7900, loss[loss=0.07307, simple_loss=0.09498, pruned_loss=0.01659, audio_tagging_loss=0.008993, over 15281.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08996, pruned_loss=0.01272, audio_tagging_loss=0.008849, over 3036986.19 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:24:27,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3098686.6666666665, ans=0.2 2023-11-27 14:24:27,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-27 14:24:39,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3098753.3333333335, ans=0.125 2023-11-27 14:24:44,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3098753.3333333335, ans=0.125 2023-11-27 14:25:07,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-11-27 14:25:16,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-27 14:25:22,380 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7950, loss[loss=0.06612, simple_loss=0.09198, pruned_loss=0.01277, audio_tagging_loss=0.007361, over 15797.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09017, pruned_loss=0.01277, audio_tagging_loss=0.008891, over 3036276.09 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:25:38,873 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:25:43,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-27 14:25:58,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3099220.0, ans=0.07 2023-11-27 14:26:00,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3099220.0, ans=0.0 2023-11-27 14:26:07,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.621e+01 8.980e+01 9.722e+01 1.502e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 14:26:09,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-27 14:26:09,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.87 vs. limit=10.0 2023-11-27 14:26:14,608 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-27 14:26:20,175 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8000, loss[loss=0.04838, simple_loss=0.05994, pruned_loss=0.00736, audio_tagging_loss=0.01105, over 13800.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09003, pruned_loss=0.01271, audio_tagging_loss=0.008913, over 3038991.06 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:26:22,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2023-11-27 14:26:45,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3099486.6666666665, ans=0.125 2023-11-27 14:27:08,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3099620.0, ans=0.0 2023-11-27 14:27:12,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-27 14:27:18,047 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8050, loss[loss=0.05522, simple_loss=0.06547, pruned_loss=0.00959, audio_tagging_loss=0.0129, over 15908.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08977, pruned_loss=0.01262, audio_tagging_loss=0.008954, over 3039896.14 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:28:01,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2023-11-27 14:28:04,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.542e+01 9.133e+01 9.654e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 14:28:09,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3099953.3333333335, ans=0.125 2023-11-27 14:28:11,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-27 14:28:11,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3099953.3333333335, ans=0.125 2023-11-27 14:28:17,187 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8100, loss[loss=0.04515, simple_loss=0.06052, pruned_loss=0.006326, audio_tagging_loss=0.008561, over 15789.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.0898, pruned_loss=0.01266, audio_tagging_loss=0.008922, over 3040488.94 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:28:19,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3100020.0, ans=0.125 2023-11-27 14:28:36,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:28:57,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-11-27 14:28:58,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3100220.0, ans=0.2 2023-11-27 14:29:09,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-27 14:29:15,404 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8150, loss[loss=0.07257, simple_loss=0.09164, pruned_loss=0.01761, audio_tagging_loss=0.009139, over 15080.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09062, pruned_loss=0.0128, audio_tagging_loss=0.008753, over 3041438.71 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:30:01,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.694e+01 9.359e+01 9.958e+01 1.190e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 14:30:07,002 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-27 14:30:12,880 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8200, loss[loss=0.03956, simple_loss=0.05036, pruned_loss=0.007154, audio_tagging_loss=0.007231, over 15417.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09047, pruned_loss=0.0128, audio_tagging_loss=0.008738, over 3044416.83 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:30:17,811 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:30:18,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3100686.6666666665, ans=0.2 2023-11-27 14:30:30,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3100753.3333333335, ans=0.125 2023-11-27 14:30:40,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-27 14:31:05,955 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-27 14:31:11,293 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8250, loss[loss=0.0622, simple_loss=0.08846, pruned_loss=0.01061, audio_tagging_loss=0.007355, over 16134.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09042, pruned_loss=0.01276, audio_tagging_loss=0.008685, over 3046837.03 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:31:42,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-27 14:31:57,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.477e+01 9.033e+01 1.006e+02 1.389e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 14:32:03,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-27 14:32:09,408 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8300, loss[loss=0.06322, simple_loss=0.09133, pruned_loss=0.006547, audio_tagging_loss=0.01101, over 15534.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08993, pruned_loss=0.0127, audio_tagging_loss=0.008772, over 3052317.83 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:32:11,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3101353.3333333335, ans=10.0 2023-11-27 14:32:37,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-27 14:32:38,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2023-11-27 14:32:40,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3101486.6666666665, ans=0.1 2023-11-27 14:33:01,345 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-27 14:33:06,778 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8350, loss[loss=0.079, simple_loss=0.1062, pruned_loss=0.01721, audio_tagging_loss=0.008712, over 15075.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08957, pruned_loss=0.01268, audio_tagging_loss=0.008914, over 3046676.54 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:33:08,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3101686.6666666665, ans=0.0 2023-11-27 14:33:19,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3101753.3333333335, ans=0.125 2023-11-27 14:33:20,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3101753.3333333335, ans=0.0 2023-11-27 14:33:31,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=12.0 2023-11-27 14:33:44,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-27 14:33:53,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.419e+01 8.984e+01 9.870e+01 1.325e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 14:33:59,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-27 14:34:05,797 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8400, loss[loss=0.07, simple_loss=0.09159, pruned_loss=0.01614, audio_tagging_loss=0.008066, over 14151.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08958, pruned_loss=0.01278, audio_tagging_loss=0.008788, over 3041122.46 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:34:08,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3102020.0, ans=0.2 2023-11-27 14:34:22,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3102086.6666666665, ans=0.2 2023-11-27 14:34:23,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3102086.6666666665, ans=0.0 2023-11-27 14:34:28,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3102153.3333333335, ans=0.035 2023-11-27 14:34:34,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3102153.3333333335, ans=0.125 2023-11-27 14:34:47,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3102220.0, ans=0.125 2023-11-27 14:34:57,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-27 14:35:03,083 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8450, loss[loss=0.0692, simple_loss=0.0988, pruned_loss=0.01123, audio_tagging_loss=0.008574, over 16047.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09037, pruned_loss=0.01293, audio_tagging_loss=0.008708, over 3042723.95 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:35:10,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3102353.3333333335, ans=0.0 2023-11-27 14:35:20,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3102420.0, ans=0.125 2023-11-27 14:35:23,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3102420.0, ans=0.125 2023-11-27 14:35:37,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102553.3333333335, ans=0.1 2023-11-27 14:35:46,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3102553.3333333335, ans=0.125 2023-11-27 14:35:49,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.838e+01 9.408e+01 1.009e+02 1.207e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:35:55,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-27 14:35:58,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3102620.0, ans=0.0 2023-11-27 14:36:02,007 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8500, loss[loss=0.05716, simple_loss=0.07388, pruned_loss=0.01147, audio_tagging_loss=0.008752, over 14809.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09058, pruned_loss=0.01292, audio_tagging_loss=0.008747, over 3039898.28 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:36:09,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3102686.6666666665, ans=0.0 2023-11-27 14:36:22,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3102753.3333333335, ans=0.125 2023-11-27 14:36:32,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102820.0, ans=0.1 2023-11-27 14:36:53,805 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-27 14:37:00,390 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8550, loss[loss=0.07533, simple_loss=0.09967, pruned_loss=0.01688, audio_tagging_loss=0.008616, over 15462.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0907, pruned_loss=0.01279, audio_tagging_loss=0.008729, over 3049022.10 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:37:08,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3103020.0, ans=0.2 2023-11-27 14:37:22,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3103153.3333333335, ans=0.125 2023-11-27 14:37:34,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3103220.0, ans=0.1 2023-11-27 14:37:36,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103220.0, ans=0.1 2023-11-27 14:37:38,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3103220.0, ans=0.125 2023-11-27 14:37:43,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3103220.0, ans=0.125 2023-11-27 14:37:47,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.683e+01 9.146e+01 9.913e+01 1.274e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 14:37:52,102 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-27 14:37:57,533 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8600, loss[loss=0.0832, simple_loss=0.1132, pruned_loss=0.01917, audio_tagging_loss=0.007435, over 15603.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09025, pruned_loss=0.0126, audio_tagging_loss=0.008751, over 3050334.78 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:38:02,313 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:38:08,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3103420.0, ans=0.125 2023-11-27 14:38:13,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3103420.0, ans=0.125 2023-11-27 14:38:18,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3103420.0, ans=0.125 2023-11-27 14:38:21,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3103486.6666666665, ans=0.125 2023-11-27 14:38:34,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-27 14:38:49,628 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-27 14:38:49,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3103620.0, ans=0.0 2023-11-27 14:38:54,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2023-11-27 14:38:55,045 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8650, loss[loss=0.06164, simple_loss=0.08625, pruned_loss=0.00622, audio_tagging_loss=0.01229, over 16054.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09079, pruned_loss=0.01273, audio_tagging_loss=0.008736, over 3047295.63 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:39:36,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103886.6666666665, ans=0.1 2023-11-27 14:39:42,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.459e+01 9.176e+01 1.006e+02 1.194e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 14:39:42,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3103953.3333333335, ans=0.04949747468305833 2023-11-27 14:39:48,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-27 14:39:50,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103953.3333333335, ans=0.1 2023-11-27 14:39:55,215 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8700, loss[loss=0.06042, simple_loss=0.0838, pruned_loss=0.009428, audio_tagging_loss=0.009091, over 16176.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09008, pruned_loss=0.01266, audio_tagging_loss=0.008879, over 3047635.25 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:40:00,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3104020.0, ans=0.125 2023-11-27 14:40:00,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3104020.0, ans=0.125 2023-11-27 14:40:07,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-27 14:40:08,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-27 14:40:18,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-27 14:40:20,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3104153.3333333335, ans=0.1 2023-11-27 14:40:42,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3104286.6666666665, ans=0.125 2023-11-27 14:40:46,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-27 14:40:46,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3104286.6666666665, ans=0.025 2023-11-27 14:40:51,977 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8750, loss[loss=0.05236, simple_loss=0.06663, pruned_loss=0.008054, audio_tagging_loss=0.011, over 15316.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09073, pruned_loss=0.01263, audio_tagging_loss=0.008918, over 3043797.81 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:40:57,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-27 14:41:01,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-27 14:41:28,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104553.3333333335, ans=0.1 2023-11-27 14:41:29,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:41:33,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3104553.3333333335, ans=0.2 2023-11-27 14:41:38,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-27 14:41:39,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.863e+01 8.832e+01 9.228e+01 1.004e+02 1.241e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 14:41:41,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104620.0, ans=0.1 2023-11-27 14:41:43,941 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-27 14:41:44,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3104620.0, ans=0.0 2023-11-27 14:41:49,416 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8800, loss[loss=0.05809, simple_loss=0.07345, pruned_loss=0.01198, audio_tagging_loss=0.009385, over 15283.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09169, pruned_loss=0.01284, audio_tagging_loss=0.008882, over 3044028.09 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:05,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-11-27 14:42:13,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.60 vs. limit=22.5 2023-11-27 14:42:41,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-27 14:42:43,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3104953.3333333335, ans=0.125 2023-11-27 14:42:44,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2023-11-27 14:42:47,783 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8850, loss[loss=0.06665, simple_loss=0.09097, pruned_loss=0.01358, audio_tagging_loss=0.007588, over 15286.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09093, pruned_loss=0.01276, audio_tagging_loss=0.008969, over 3039698.13 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:54,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3105020.0, ans=0.015 2023-11-27 14:42:55,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3105020.0, ans=0.07 2023-11-27 14:43:03,122 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:43:04,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3105086.6666666665, ans=0.125 2023-11-27 14:43:14,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3105153.3333333335, ans=0.2 2023-11-27 14:43:25,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2023-11-27 14:43:29,062 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:43:35,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.774e+01 9.216e+01 1.007e+02 1.244e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 14:43:38,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3105286.6666666665, ans=0.0 2023-11-27 14:43:40,101 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-27 14:43:40,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3105286.6666666665, ans=0.95 2023-11-27 14:43:41,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3105286.6666666665, ans=0.04949747468305833 2023-11-27 14:43:45,768 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8900, loss[loss=0.06565, simple_loss=0.08591, pruned_loss=0.01437, audio_tagging_loss=0.008322, over 15285.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09142, pruned_loss=0.01274, audio_tagging_loss=0.008855, over 3050971.02 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:43:46,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3105353.3333333335, ans=0.1 2023-11-27 14:44:00,153 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:44:04,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3105420.0, ans=0.0 2023-11-27 14:44:04,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3105420.0, ans=0.125 2023-11-27 14:44:15,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-27 14:44:16,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-27 14:44:33,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3105620.0, ans=0.125 2023-11-27 14:44:36,779 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-27 14:44:38,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3105620.0, ans=0.125 2023-11-27 14:44:42,239 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8950, loss[loss=0.07655, simple_loss=0.1151, pruned_loss=0.01102, audio_tagging_loss=0.007982, over 15007.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09246, pruned_loss=0.01288, audio_tagging_loss=0.008742, over 3049737.24 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:44:49,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3105686.6666666665, ans=0.0 2023-11-27 14:44:49,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-27 14:44:52,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3105753.3333333335, ans=0.1 2023-11-27 14:45:05,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=22.5 2023-11-27 14:45:18,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3105886.6666666665, ans=0.125 2023-11-27 14:45:29,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.633e+01 9.411e+01 1.036e+02 1.341e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:45:32,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-27 14:45:34,003 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-27 14:45:39,977 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9000, loss[loss=0.1097, simple_loss=0.1552, pruned_loss=0.02418, audio_tagging_loss=0.00788, over 16382.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.09319, pruned_loss=0.01302, audio_tagging_loss=0.008646, over 3046491.51 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:45:39,978 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 14:46:02,008 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8300, 2.9231, 2.8158, 2.7357, 3.3580, 3.3885, 3.1450, 3.5250], device='cuda:3') 2023-11-27 14:46:08,698 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1362, 2.4872, 4.9954, 2.9046], device='cuda:3') 2023-11-27 14:46:15,059 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05878, simple_loss=0.0507, pruned_loss=0.005237, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-27 14:46:15,060 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 14:46:42,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-27 14:46:56,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3106220.0, ans=0.125 2023-11-27 14:47:06,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-27 14:47:11,959 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9050, loss[loss=0.06451, simple_loss=0.08115, pruned_loss=0.01479, audio_tagging_loss=0.009151, over 15491.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.09301, pruned_loss=0.01314, audio_tagging_loss=0.008669, over 3044812.58 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:47:35,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3106486.6666666665, ans=0.125 2023-11-27 14:47:47,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-27 14:47:54,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3106553.3333333335, ans=0.2 2023-11-27 14:47:59,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.800e+01 9.356e+01 9.893e+01 1.212e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 14:47:59,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3106620.0, ans=0.125 2023-11-27 14:48:04,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-27 14:48:10,413 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9100, loss[loss=0.09668, simple_loss=0.1265, pruned_loss=0.0256, audio_tagging_loss=0.007839, over 14479.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09267, pruned_loss=0.01311, audio_tagging_loss=0.008702, over 3048268.76 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:48:29,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3106753.3333333335, ans=0.125 2023-11-27 14:48:29,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-11-27 14:48:40,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-27 14:48:48,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3106886.6666666665, ans=0.125 2023-11-27 14:48:56,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2023-11-27 14:49:03,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-27 14:49:09,110 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9150, loss[loss=0.06761, simple_loss=0.09516, pruned_loss=0.01234, audio_tagging_loss=0.007688, over 14387.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09218, pruned_loss=0.01315, audio_tagging_loss=0.008642, over 3043045.62 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:49:14,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3107020.0, ans=0.125 2023-11-27 14:49:36,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3107153.3333333335, ans=0.2 2023-11-27 14:49:41,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3107153.3333333335, ans=0.125 2023-11-27 14:49:44,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3107220.0, ans=0.125 2023-11-27 14:49:48,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3107220.0, ans=0.125 2023-11-27 14:49:57,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.571e+01 9.032e+01 9.849e+01 1.366e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:50:01,177 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-27 14:50:06,683 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9200, loss[loss=0.0708, simple_loss=0.09396, pruned_loss=0.01515, audio_tagging_loss=0.008671, over 15218.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09267, pruned_loss=0.01315, audio_tagging_loss=0.008533, over 3048190.21 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:50:30,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3107486.6666666665, ans=0.0 2023-11-27 14:50:52,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3107620.0, ans=0.1 2023-11-27 14:50:58,598 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-27 14:51:04,609 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9250, loss[loss=0.09699, simple_loss=0.136, pruned_loss=0.02177, audio_tagging_loss=0.007224, over 15134.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09252, pruned_loss=0.01323, audio_tagging_loss=0.008652, over 3045361.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:51:10,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2023-11-27 14:51:18,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-27 14:51:21,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-27 14:51:39,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3107886.6666666665, ans=0.1 2023-11-27 14:51:55,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.711e+01 9.233e+01 9.983e+01 1.314e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 14:51:57,563 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-27 14:52:01,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2023-11-27 14:52:03,329 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9300, loss[loss=0.05457, simple_loss=0.07228, pruned_loss=0.01246, audio_tagging_loss=0.00597, over 14619.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09159, pruned_loss=0.01319, audio_tagging_loss=0.008656, over 3041123.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:52:04,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3108020.0, ans=0.125 2023-11-27 14:52:15,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-27 14:52:18,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3108086.6666666665, ans=0.07 2023-11-27 14:52:24,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-27 14:52:32,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3108153.3333333335, ans=0.125 2023-11-27 14:52:34,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3108153.3333333335, ans=0.125 2023-11-27 14:52:54,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-27 14:52:59,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:53:00,962 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9350, loss[loss=0.07362, simple_loss=0.1022, pruned_loss=0.01493, audio_tagging_loss=0.007609, over 15126.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09064, pruned_loss=0.01303, audio_tagging_loss=0.008727, over 3043161.66 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:53:05,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-27 14:53:27,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3108486.6666666665, ans=0.0 2023-11-27 14:53:43,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3108553.3333333335, ans=0.0 2023-11-27 14:53:49,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.694e+01 9.307e+01 9.983e+01 1.185e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 14:53:52,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-27 14:53:52,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3108620.0, ans=0.125 2023-11-27 14:53:58,157 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9400, loss[loss=0.07344, simple_loss=0.09595, pruned_loss=0.01615, audio_tagging_loss=0.009308, over 13865.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.08983, pruned_loss=0.01284, audio_tagging_loss=0.008853, over 3047370.27 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:54:09,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3108753.3333333335, ans=0.125 2023-11-27 14:54:51,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-27 14:54:52,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3108953.3333333335, ans=0.0 2023-11-27 14:54:56,779 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9450, loss[loss=0.06852, simple_loss=0.09343, pruned_loss=0.01315, audio_tagging_loss=0.008653, over 14796.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08959, pruned_loss=0.01253, audio_tagging_loss=0.008962, over 3044903.28 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:54:59,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3109020.0, ans=0.0 2023-11-27 14:55:00,126 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:55:11,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3109086.6666666665, ans=0.0 2023-11-27 14:55:18,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3109153.3333333335, ans=0.125 2023-11-27 14:55:20,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=3109153.3333333335, ans=15.0 2023-11-27 14:55:21,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3109153.3333333335, ans=0.125 2023-11-27 14:55:30,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3109220.0, ans=0.125 2023-11-27 14:55:41,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3109220.0, ans=0.0 2023-11-27 14:55:46,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.803e+01 9.221e+01 9.903e+01 1.293e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 14:55:48,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3109286.6666666665, ans=0.1 2023-11-27 14:55:49,044 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-27 14:55:54,757 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9500, loss[loss=0.07005, simple_loss=0.08209, pruned_loss=0.0152, audio_tagging_loss=0.0138, over 13555.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08899, pruned_loss=0.0124, audio_tagging_loss=0.009109, over 3043670.69 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:56:03,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3109353.3333333335, ans=0.125 2023-11-27 14:56:12,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3109420.0, ans=0.04949747468305833 2023-11-27 14:56:17,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3109486.6666666665, ans=0.125 2023-11-27 14:56:27,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-11-27 14:56:47,006 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-27 14:56:48,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3109620.0, ans=0.125 2023-11-27 14:56:52,500 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9550, loss[loss=0.07445, simple_loss=0.1042, pruned_loss=0.0147, audio_tagging_loss=0.007639, over 14912.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09002, pruned_loss=0.01254, audio_tagging_loss=0.009044, over 3045026.09 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:57:01,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-27 14:57:02,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-27 14:57:08,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2023-11-27 14:57:15,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3109820.0, ans=0.2 2023-11-27 14:57:31,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3109886.6666666665, ans=0.125 2023-11-27 14:57:42,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.709e+01 9.251e+01 1.020e+02 1.249e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 14:57:45,227 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-27 14:57:49,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109953.3333333335, ans=0.1 2023-11-27 14:57:49,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-27 14:57:51,304 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9600, loss[loss=0.0646, simple_loss=0.0923, pruned_loss=0.01042, audio_tagging_loss=0.008025, over 14607.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09061, pruned_loss=0.01266, audio_tagging_loss=0.009047, over 3046141.42 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:57:54,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3110020.0, ans=0.0 2023-11-27 14:58:01,383 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:58:24,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3110220.0, ans=0.125 2023-11-27 14:58:42,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-27 14:58:48,147 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9650, loss[loss=0.072, simple_loss=0.09842, pruned_loss=0.01226, audio_tagging_loss=0.01054, over 15527.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08987, pruned_loss=0.01243, audio_tagging_loss=0.009042, over 3047800.61 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:58:49,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-27 14:58:51,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3110353.3333333335, ans=0.125 2023-11-27 14:59:35,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3110620.0, ans=0.2 2023-11-27 14:59:37,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.987e+01 9.663e+01 1.056e+02 1.477e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 14:59:39,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-27 14:59:46,069 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9700, loss[loss=0.06588, simple_loss=0.09818, pruned_loss=0.0101, audio_tagging_loss=0.006685, over 16288.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09005, pruned_loss=0.01248, audio_tagging_loss=0.008893, over 3046165.44 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:14,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3110820.0, ans=0.0 2023-11-27 15:00:38,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-27 15:00:39,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-27 15:00:44,710 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9750, loss[loss=0.06422, simple_loss=0.09064, pruned_loss=0.01234, audio_tagging_loss=0.006561, over 14548.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09039, pruned_loss=0.01259, audio_tagging_loss=0.008754, over 3044335.81 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:44,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3111020.0, ans=0.125 2023-11-27 15:00:47,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3111020.0, ans=0.025 2023-11-27 15:00:50,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3111020.0, ans=0.125 2023-11-27 15:00:54,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3111086.6666666665, ans=0.125 2023-11-27 15:01:06,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-27 15:01:15,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-11-27 15:01:24,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3111220.0, ans=0.2 2023-11-27 15:01:35,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.575e+01 9.224e+01 9.953e+01 1.254e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 15:01:36,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-27 15:01:41,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.00 vs. limit=6.0 2023-11-27 15:01:41,862 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9800, loss[loss=0.04749, simple_loss=0.0656, pruned_loss=0.00607, audio_tagging_loss=0.008621, over 14065.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08985, pruned_loss=0.01252, audio_tagging_loss=0.00866, over 3046631.53 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:01:42,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-27 15:01:54,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2023-11-27 15:01:57,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3111420.0, ans=0.125 2023-11-27 15:02:00,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3111420.0, ans=0.125 2023-11-27 15:02:01,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3111420.0, ans=0.125 2023-11-27 15:02:08,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3111486.6666666665, ans=0.04949747468305833 2023-11-27 15:02:14,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-27 15:02:17,914 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:02:25,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3111553.3333333335, ans=0.125 2023-11-27 15:02:32,993 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-27 15:02:36,200 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:02:38,368 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9850, loss[loss=0.06985, simple_loss=0.1003, pruned_loss=0.01339, audio_tagging_loss=0.0063, over 15090.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08948, pruned_loss=0.01245, audio_tagging_loss=0.00866, over 3047905.85 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:02:45,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3111686.6666666665, ans=0.0 2023-11-27 15:03:08,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111820.0, ans=0.125 2023-11-27 15:03:14,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3111886.6666666665, ans=0.125 2023-11-27 15:03:29,428 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.745e+01 9.208e+01 1.002e+02 1.325e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 15:03:30,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-27 15:03:32,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3111953.3333333335, ans=0.2 2023-11-27 15:03:36,883 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9900, loss[loss=0.05599, simple_loss=0.06891, pruned_loss=0.009917, audio_tagging_loss=0.01162, over 14444.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08963, pruned_loss=0.01249, audio_tagging_loss=0.00872, over 3042023.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:03:47,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112086.6666666665, ans=0.1 2023-11-27 15:03:57,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3112086.6666666665, ans=0.125 2023-11-27 15:03:57,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3112086.6666666665, ans=0.0 2023-11-27 15:04:06,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.69 vs. limit=22.5 2023-11-27 15:04:07,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3112153.3333333335, ans=0.125 2023-11-27 15:04:08,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3112153.3333333335, ans=0.95 2023-11-27 15:04:29,154 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-27 15:04:33,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3112353.3333333335, ans=0.0 2023-11-27 15:04:34,669 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9950, loss[loss=0.05747, simple_loss=0.07965, pruned_loss=0.009232, audio_tagging_loss=0.008413, over 14860.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09052, pruned_loss=0.01268, audio_tagging_loss=0.008602, over 3043664.21 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:04:43,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3112353.3333333335, ans=0.1 2023-11-27 15:04:53,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112420.0, ans=0.1 2023-11-27 15:04:54,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3112420.0, ans=0.0 2023-11-27 15:05:00,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3112486.6666666665, ans=0.0 2023-11-27 15:05:13,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-27 15:05:17,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3112553.3333333335, ans=0.0 2023-11-27 15:05:24,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.583e+01 9.133e+01 9.835e+01 1.182e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 15:05:25,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-27 15:05:31,313 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10000, loss[loss=0.07771, simple_loss=0.1058, pruned_loss=0.01614, audio_tagging_loss=0.008688, over 16568.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09095, pruned_loss=0.01278, audio_tagging_loss=0.008571, over 3048187.84 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:05:33,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3112686.6666666665, ans=0.1 2023-11-27 15:05:38,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3112686.6666666665, ans=0.125 2023-11-27 15:05:47,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3112753.3333333335, ans=0.125 2023-11-27 15:05:57,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3112820.0, ans=0.125 2023-11-27 15:06:03,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3112820.0, ans=0.125 2023-11-27 15:06:10,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.61 vs. limit=10.0 2023-11-27 15:06:23,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-27 15:06:28,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3113020.0, ans=0.125 2023-11-27 15:06:29,080 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10050, loss[loss=0.04469, simple_loss=0.0596, pruned_loss=0.005946, audio_tagging_loss=0.008944, over 15343.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09011, pruned_loss=0.01263, audio_tagging_loss=0.008564, over 3044828.95 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:06:40,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2023-11-27 15:06:49,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3113086.6666666665, ans=0.2 2023-11-27 15:07:03,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3113220.0, ans=0.2 2023-11-27 15:07:13,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-27 15:07:13,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3113220.0, ans=15.0 2023-11-27 15:07:14,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-27 15:07:20,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.460e+01 9.017e+01 9.705e+01 1.338e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-27 15:07:21,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-27 15:07:27,156 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10100, loss[loss=0.07971, simple_loss=0.117, pruned_loss=0.01302, audio_tagging_loss=0.008195, over 14629.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09128, pruned_loss=0.01258, audio_tagging_loss=0.008615, over 3045002.28 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:07:34,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3113353.3333333335, ans=0.0 2023-11-27 15:08:11,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3113553.3333333335, ans=0.2 2023-11-27 15:08:13,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3113620.0, ans=0.07 2023-11-27 15:08:17,376 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:08:18,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-27 15:08:23,953 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10150, loss[loss=0.05087, simple_loss=0.06894, pruned_loss=0.00787, audio_tagging_loss=0.008528, over 16006.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09094, pruned_loss=0.01249, audio_tagging_loss=0.008652, over 3045764.99 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:08:55,784 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:09:11,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3113953.3333333335, ans=0.1 2023-11-27 15:09:14,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.492e+01 9.148e+01 9.869e+01 1.257e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:09:15,486 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-27 15:09:21,474 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10200, loss[loss=0.05357, simple_loss=0.06611, pruned_loss=0.01329, audio_tagging_loss=0.007225, over 14564.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0907, pruned_loss=0.01252, audio_tagging_loss=0.008645, over 3052860.01 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:09:28,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:28,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3114020.0, ans=0.02 2023-11-27 15:09:30,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:33,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114086.6666666665, ans=0.1 2023-11-27 15:09:46,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3114153.3333333335, ans=0.125 2023-11-27 15:09:48,625 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:09:48,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3114153.3333333335, ans=0.0 2023-11-27 15:09:57,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3114220.0, ans=0.125 2023-11-27 15:10:00,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3114220.0, ans=0.0 2023-11-27 15:10:09,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3114286.6666666665, ans=0.125 2023-11-27 15:10:14,265 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-27 15:10:20,422 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10250, loss[loss=0.06121, simple_loss=0.07357, pruned_loss=0.01332, audio_tagging_loss=0.01111, over 15573.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09027, pruned_loss=0.01274, audio_tagging_loss=0.008779, over 3048609.11 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:10:59,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3114553.3333333335, ans=0.5 2023-11-27 15:11:06,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3114620.0, ans=0.125 2023-11-27 15:11:11,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.883e+01 9.540e+01 1.004e+02 1.419e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 15:11:11,670 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-27 15:11:11,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3114620.0, ans=0.125 2023-11-27 15:11:17,259 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10300, loss[loss=0.06556, simple_loss=0.08905, pruned_loss=0.01159, audio_tagging_loss=0.009443, over 15038.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09005, pruned_loss=0.01274, audio_tagging_loss=0.008931, over 3047804.00 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:11:34,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3114753.3333333335, ans=0.125 2023-11-27 15:11:44,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114820.0, ans=0.125 2023-11-27 15:11:49,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-27 15:11:54,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3114886.6666666665, ans=0.125 2023-11-27 15:11:55,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-11-27 15:11:59,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3114886.6666666665, ans=0.2 2023-11-27 15:12:06,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-27 15:12:08,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-27 15:12:09,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-27 15:12:10,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-27 15:12:14,905 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10350, loss[loss=0.07019, simple_loss=0.1081, pruned_loss=0.009858, audio_tagging_loss=0.006262, over 15037.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09053, pruned_loss=0.01278, audio_tagging_loss=0.009052, over 3047855.04 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:12:18,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3115020.0, ans=0.125 2023-11-27 15:12:24,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-27 15:12:43,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3115153.3333333335, ans=0.125 2023-11-27 15:12:55,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3115220.0, ans=0.125 2023-11-27 15:13:02,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-27 15:13:07,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.698e+01 9.408e+01 1.024e+02 1.336e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 15:13:07,740 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-27 15:13:07,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3115286.6666666665, ans=0.0 2023-11-27 15:13:13,065 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10400, loss[loss=0.06378, simple_loss=0.08691, pruned_loss=0.01013, audio_tagging_loss=0.01021, over 15344.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09066, pruned_loss=0.0126, audio_tagging_loss=0.009059, over 3046990.89 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:13:14,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-27 15:13:33,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3115420.0, ans=0.125 2023-11-27 15:13:41,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3115486.6666666665, ans=0.2 2023-11-27 15:13:52,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3115553.3333333335, ans=0.05 2023-11-27 15:14:01,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3115620.0, ans=0.025 2023-11-27 15:14:05,093 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-27 15:14:08,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3115620.0, ans=0.0 2023-11-27 15:14:10,442 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10450, loss[loss=0.06855, simple_loss=0.0958, pruned_loss=0.01307, audio_tagging_loss=0.007582, over 15916.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09026, pruned_loss=0.01262, audio_tagging_loss=0.009072, over 3046385.82 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:14:14,070 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:14:18,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3115686.6666666665, ans=0.125 2023-11-27 15:15:02,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.776e+01 9.506e+01 1.066e+02 3.679e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-27 15:15:02,162 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-27 15:15:03,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3115953.3333333335, ans=0.125 2023-11-27 15:15:08,013 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10500, loss[loss=0.07003, simple_loss=0.09933, pruned_loss=0.01203, audio_tagging_loss=0.008334, over 15380.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09054, pruned_loss=0.01261, audio_tagging_loss=0.00894, over 3043013.53 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:15:37,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3116153.3333333335, ans=0.125 2023-11-27 15:15:48,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3116220.0, ans=0.95 2023-11-27 15:15:53,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3116286.6666666665, ans=0.125 2023-11-27 15:16:00,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-27 15:16:00,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3116286.6666666665, ans=0.0 2023-11-27 15:16:06,291 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10550, loss[loss=0.06311, simple_loss=0.08034, pruned_loss=0.01226, audio_tagging_loss=0.01068, over 15496.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09094, pruned_loss=0.0126, audio_tagging_loss=0.008705, over 3047718.91 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:16:06,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2023-11-27 15:16:08,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-27 15:16:24,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116420.0, ans=0.1 2023-11-27 15:16:32,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3116486.6666666665, ans=0.125 2023-11-27 15:16:57,560 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-27 15:16:58,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.619e+01 9.191e+01 9.903e+01 2.574e+02, threshold=1.838e+02, percent-clipped=2.0 2023-11-27 15:17:03,601 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10600, loss[loss=0.06712, simple_loss=0.09702, pruned_loss=0.01229, audio_tagging_loss=0.00632, over 15726.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09117, pruned_loss=0.01274, audio_tagging_loss=0.008697, over 3049633.88 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:17:08,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-27 15:17:08,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-27 15:17:10,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-27 15:17:10,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3116686.6666666665, ans=10.0 2023-11-27 15:17:24,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3116753.3333333335, ans=0.0 2023-11-27 15:17:40,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3116886.6666666665, ans=0.1 2023-11-27 15:17:55,350 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-27 15:18:00,691 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10650, loss[loss=0.06819, simple_loss=0.08002, pruned_loss=0.01591, audio_tagging_loss=0.01227, over 13829.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09153, pruned_loss=0.01274, audio_tagging_loss=0.008691, over 3055877.50 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:18:10,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3117020.0, ans=0.125 2023-11-27 15:18:10,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3117020.0, ans=0.2 2023-11-27 15:18:23,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3117153.3333333335, ans=0.125 2023-11-27 15:18:23,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3117153.3333333335, ans=0.1 2023-11-27 15:18:25,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3117153.3333333335, ans=0.125 2023-11-27 15:18:44,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3117220.0, ans=0.125 2023-11-27 15:18:45,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3117286.6666666665, ans=0.125 2023-11-27 15:18:52,855 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-27 15:18:55,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.520e+01 9.175e+01 9.888e+01 1.340e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 15:18:57,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-27 15:18:59,301 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10700, loss[loss=0.06872, simple_loss=0.09715, pruned_loss=0.01433, audio_tagging_loss=0.005819, over 16128.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.0911, pruned_loss=0.01271, audio_tagging_loss=0.008728, over 3058272.32 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:19:20,060 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:19:21,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3117486.6666666665, ans=0.125 2023-11-27 15:19:31,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3117553.3333333335, ans=0.0 2023-11-27 15:19:50,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-27 15:19:56,269 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10750, loss[loss=0.07674, simple_loss=0.1059, pruned_loss=0.01385, audio_tagging_loss=0.009971, over 16679.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09118, pruned_loss=0.01279, audio_tagging_loss=0.008711, over 3060896.60 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:20:35,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-27 15:20:47,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-27 15:20:47,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-27 15:20:49,907 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.438e+01 8.439e+01 9.244e+01 9.878e+01 1.512e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 15:20:53,261 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10800, loss[loss=0.06683, simple_loss=0.09879, pruned_loss=0.01151, audio_tagging_loss=0.005923, over 15748.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09081, pruned_loss=0.01272, audio_tagging_loss=0.008688, over 3057025.71 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:20:59,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3118020.0, ans=0.0 2023-11-27 15:21:00,474 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:21:06,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3118086.6666666665, ans=0.125 2023-11-27 15:21:22,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3118153.3333333335, ans=0.0 2023-11-27 15:21:35,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3118220.0, ans=0.125 2023-11-27 15:21:44,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-27 15:21:50,794 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10850, loss[loss=0.05917, simple_loss=0.06983, pruned_loss=0.01543, audio_tagging_loss=0.008824, over 16119.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09126, pruned_loss=0.01287, audio_tagging_loss=0.008718, over 3057791.42 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:21:53,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3118353.3333333335, ans=10.0 2023-11-27 15:21:55,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3118353.3333333335, ans=0.125 2023-11-27 15:22:01,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3118420.0, ans=0.125 2023-11-27 15:22:03,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.99 vs. limit=10.0 2023-11-27 15:22:04,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-27 15:22:24,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2023-11-27 15:22:30,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3118553.3333333335, ans=0.125 2023-11-27 15:22:31,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3118553.3333333335, ans=0.125 2023-11-27 15:22:36,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3118620.0, ans=0.2 2023-11-27 15:22:41,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118620.0, ans=0.1 2023-11-27 15:22:43,093 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-27 15:22:46,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.986e+01 9.693e+01 1.013e+02 1.433e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 15:22:48,720 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10900, loss[loss=0.06979, simple_loss=0.0973, pruned_loss=0.01374, audio_tagging_loss=0.0074, over 14732.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09087, pruned_loss=0.0128, audio_tagging_loss=0.008755, over 3048997.76 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:22:49,817 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:23:10,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3118820.0, ans=0.1 2023-11-27 15:23:12,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3118820.0, ans=0.125 2023-11-27 15:23:18,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3118820.0, ans=0.125 2023-11-27 15:23:22,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3118820.0, ans=15.0 2023-11-27 15:23:22,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3118886.6666666665, ans=0.125 2023-11-27 15:23:40,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-27 15:23:45,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3119020.0, ans=15.0 2023-11-27 15:23:45,537 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10950, loss[loss=0.05519, simple_loss=0.07441, pruned_loss=0.007854, audio_tagging_loss=0.01013, over 14672.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09107, pruned_loss=0.01283, audio_tagging_loss=0.008822, over 3047333.66 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:23:45,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3119020.0, ans=0.125 2023-11-27 15:23:58,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2023-11-27 15:24:09,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3119153.3333333335, ans=0.035 2023-11-27 15:24:12,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3119153.3333333335, ans=0.2 2023-11-27 15:24:12,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3119153.3333333335, ans=0.0 2023-11-27 15:24:21,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3119220.0, ans=0.125 2023-11-27 15:24:37,558 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-27 15:24:40,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.952e+01 9.286e+01 1.000e+02 1.370e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 15:24:42,830 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11000, loss[loss=0.05062, simple_loss=0.06745, pruned_loss=0.005051, audio_tagging_loss=0.01185, over 14835.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09105, pruned_loss=0.01285, audio_tagging_loss=0.008859, over 3048220.06 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:43,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3119353.3333333335, ans=0.125 2023-11-27 15:24:57,074 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:25:07,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3119486.6666666665, ans=0.125 2023-11-27 15:25:17,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3119553.3333333335, ans=0.125 2023-11-27 15:25:18,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3119553.3333333335, ans=0.0 2023-11-27 15:25:35,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-27 15:25:37,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3119620.0, ans=0.125 2023-11-27 15:25:38,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3119620.0, ans=0.0 2023-11-27 15:25:39,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2023-11-27 15:25:40,950 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11050, loss[loss=0.06882, simple_loss=0.08598, pruned_loss=0.01698, audio_tagging_loss=0.008855, over 14390.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09093, pruned_loss=0.01287, audio_tagging_loss=0.008946, over 3042420.08 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:25:54,377 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:25:56,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3119753.3333333335, ans=0.1 2023-11-27 15:26:00,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3119753.3333333335, ans=0.125 2023-11-27 15:26:02,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3119820.0, ans=0.1 2023-11-27 15:26:08,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3119820.0, ans=0.5 2023-11-27 15:26:25,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3119953.3333333335, ans=0.125 2023-11-27 15:26:31,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-27 15:26:37,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.786e+01 9.414e+01 9.890e+01 1.526e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 15:26:39,330 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11100, loss[loss=0.05563, simple_loss=0.07205, pruned_loss=0.008809, audio_tagging_loss=0.0108, over 15561.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09021, pruned_loss=0.01269, audio_tagging_loss=0.009039, over 3051334.25 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:27:14,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3120220.0, ans=0.125 2023-11-27 15:27:30,433 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-27 15:27:34,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3120286.6666666665, ans=0.0 2023-11-27 15:27:36,995 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11150, loss[loss=0.06606, simple_loss=0.09259, pruned_loss=0.01197, audio_tagging_loss=0.007799, over 16211.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09014, pruned_loss=0.01263, audio_tagging_loss=0.009154, over 3047267.36 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:28:00,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-27 15:28:12,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3120553.3333333335, ans=0.125 2023-11-27 15:28:18,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3120553.3333333335, ans=0.125 2023-11-27 15:28:29,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-27 15:28:29,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3120620.0, ans=0.2 2023-11-27 15:28:29,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2023-11-27 15:28:32,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.619e+01 9.079e+01 9.995e+01 1.250e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 15:28:34,467 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11200, loss[loss=0.05617, simple_loss=0.07769, pruned_loss=0.009987, audio_tagging_loss=0.007336, over 15195.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09012, pruned_loss=0.0125, audio_tagging_loss=0.009248, over 3052427.38 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:28:43,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3120686.6666666665, ans=0.2 2023-11-27 15:28:45,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-27 15:28:54,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3120753.3333333335, ans=0.09899494936611666 2023-11-27 15:28:56,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.93 vs. limit=6.0 2023-11-27 15:29:05,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3120820.0, ans=0.0 2023-11-27 15:29:13,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2023-11-27 15:29:15,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3120886.6666666665, ans=0.125 2023-11-27 15:29:25,877 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-27 15:29:31,260 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11250, loss[loss=0.05495, simple_loss=0.06975, pruned_loss=0.01017, audio_tagging_loss=0.009909, over 15892.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08915, pruned_loss=0.01244, audio_tagging_loss=0.009205, over 3052395.48 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:29:34,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3121020.0, ans=0.125 2023-11-27 15:29:37,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3121020.0, ans=0.125 2023-11-27 15:29:54,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3121153.3333333335, ans=0.1 2023-11-27 15:30:09,501 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:30:19,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-27 15:30:22,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-27 15:30:27,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.697e+01 9.472e+01 1.045e+02 1.319e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 15:30:28,835 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11300, loss[loss=0.05997, simple_loss=0.0812, pruned_loss=0.01019, audio_tagging_loss=0.009179, over 15278.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09028, pruned_loss=0.01268, audio_tagging_loss=0.009032, over 3052920.53 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:30:45,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3121420.0, ans=0.125 2023-11-27 15:30:56,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3121486.6666666665, ans=0.0 2023-11-27 15:31:00,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3121486.6666666665, ans=0.125 2023-11-27 15:31:20,422 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-27 15:31:25,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3121686.6666666665, ans=0.125 2023-11-27 15:31:26,446 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11350, loss[loss=0.08244, simple_loss=0.1154, pruned_loss=0.01921, audio_tagging_loss=0.005516, over 15189.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09053, pruned_loss=0.01261, audio_tagging_loss=0.008983, over 3057692.69 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:32:04,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3121886.6666666665, ans=0.0 2023-11-27 15:32:17,326 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-27 15:32:21,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.462e+01 9.162e+01 9.878e+01 1.221e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 15:32:22,615 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11400, loss[loss=0.05865, simple_loss=0.08527, pruned_loss=0.01062, audio_tagging_loss=0.005397, over 16469.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09042, pruned_loss=0.01254, audio_tagging_loss=0.009008, over 3055680.93 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:33:00,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3122220.0, ans=0.125 2023-11-27 15:33:00,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3122220.0, ans=0.0 2023-11-27 15:33:13,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-27 15:33:14,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3122286.6666666665, ans=0.2 2023-11-27 15:33:18,862 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11450, loss[loss=0.06691, simple_loss=0.09058, pruned_loss=0.0123, audio_tagging_loss=0.009319, over 16320.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09048, pruned_loss=0.0126, audio_tagging_loss=0.008885, over 3057089.80 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:33:53,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3122553.3333333335, ans=0.0 2023-11-27 15:34:01,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3122553.3333333335, ans=0.2 2023-11-27 15:34:11,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-27 15:34:16,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.713e+01 9.522e+01 1.023e+02 1.434e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 15:34:17,527 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11500, loss[loss=0.06495, simple_loss=0.08737, pruned_loss=0.009043, audio_tagging_loss=0.01222, over 15565.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09048, pruned_loss=0.01267, audio_tagging_loss=0.008834, over 3049415.32 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:34:34,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3122753.3333333335, ans=0.1 2023-11-27 15:34:35,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3122753.3333333335, ans=0.125 2023-11-27 15:34:36,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2023-11-27 15:35:05,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3122953.3333333335, ans=0.0 2023-11-27 15:35:09,552 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-27 15:35:15,020 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11550, loss[loss=0.07311, simple_loss=0.1017, pruned_loss=0.01559, audio_tagging_loss=0.006669, over 16034.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09086, pruned_loss=0.01281, audio_tagging_loss=0.008729, over 3048058.13 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:35:21,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3123020.0, ans=0.125 2023-11-27 15:35:28,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3123086.6666666665, ans=0.04949747468305833 2023-11-27 15:35:33,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3123086.6666666665, ans=0.0 2023-11-27 15:35:54,360 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:35:57,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3123220.0, ans=0.2 2023-11-27 15:36:06,435 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-27 15:36:10,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.754e+01 9.552e+01 1.002e+02 2.038e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 15:36:11,822 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11600, loss[loss=0.07581, simple_loss=0.1001, pruned_loss=0.01743, audio_tagging_loss=0.008318, over 14208.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09152, pruned_loss=0.01284, audio_tagging_loss=0.008661, over 3047757.85 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:36:12,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3123353.3333333335, ans=0.95 2023-11-27 15:36:20,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3123353.3333333335, ans=0.0 2023-11-27 15:36:23,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3123420.0, ans=0.125 2023-11-27 15:36:28,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3123420.0, ans=0.1 2023-11-27 15:36:29,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3123420.0, ans=0.125 2023-11-27 15:36:33,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3123420.0, ans=0.0 2023-11-27 15:36:40,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2023-11-27 15:36:48,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3123553.3333333335, ans=0.125 2023-11-27 15:36:54,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3123553.3333333335, ans=0.125 2023-11-27 15:37:00,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3123620.0, ans=0.125 2023-11-27 15:37:03,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-27 15:37:09,512 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11650, loss[loss=0.05569, simple_loss=0.07721, pruned_loss=0.007943, audio_tagging_loss=0.009137, over 15727.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09117, pruned_loss=0.01285, audio_tagging_loss=0.00873, over 3041435.87 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:37:14,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3123686.6666666665, ans=0.0 2023-11-27 15:37:29,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3123753.3333333335, ans=0.0 2023-11-27 15:37:42,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3123886.6666666665, ans=0.125 2023-11-27 15:38:01,341 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-27 15:38:06,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.764e+01 9.242e+01 1.012e+02 1.452e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:38:07,817 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11700, loss[loss=0.05301, simple_loss=0.06879, pruned_loss=0.006814, audio_tagging_loss=0.01181, over 14194.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09033, pruned_loss=0.01277, audio_tagging_loss=0.008776, over 3041087.59 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:38:13,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3124020.0, ans=0.125 2023-11-27 15:38:21,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2023-11-27 15:38:50,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3124220.0, ans=0.0 2023-11-27 15:38:59,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-27 15:39:04,392 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11750, loss[loss=0.05852, simple_loss=0.07313, pruned_loss=0.01054, audio_tagging_loss=0.01141, over 14468.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08914, pruned_loss=0.01254, audio_tagging_loss=0.008815, over 3045457.94 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:39:29,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-27 15:39:34,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3124486.6666666665, ans=0.0 2023-11-27 15:39:56,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-27 15:40:00,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.569e+01 9.104e+01 9.733e+01 1.192e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 15:40:01,917 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11800, loss[loss=0.06865, simple_loss=0.09738, pruned_loss=0.01282, audio_tagging_loss=0.007137, over 15195.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08925, pruned_loss=0.01256, audio_tagging_loss=0.008847, over 3044357.09 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:40:16,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3124753.3333333335, ans=0.125 2023-11-27 15:40:31,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3124820.0, ans=0.0 2023-11-27 15:40:45,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3124886.6666666665, ans=0.1 2023-11-27 15:40:53,868 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-27 15:40:59,250 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11850, loss[loss=0.08019, simple_loss=0.1085, pruned_loss=0.01759, audio_tagging_loss=0.008357, over 16192.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08893, pruned_loss=0.01248, audio_tagging_loss=0.009005, over 3042366.87 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:41:07,636 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:41:26,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3125153.3333333335, ans=0.125 2023-11-27 15:41:50,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-27 15:41:55,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.519e+01 9.146e+01 9.837e+01 1.247e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 15:41:56,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=12.0 2023-11-27 15:41:56,673 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11900, loss[loss=0.07331, simple_loss=0.09713, pruned_loss=0.01461, audio_tagging_loss=0.01013, over 15573.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08959, pruned_loss=0.01241, audio_tagging_loss=0.008991, over 3044287.47 frames. ], batch size: 60, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:42:40,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-11-27 15:42:45,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3125620.0, ans=0.125 2023-11-27 15:42:47,617 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-27 15:42:53,403 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11950, loss[loss=0.06452, simple_loss=0.08623, pruned_loss=0.01188, audio_tagging_loss=0.009517, over 15144.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.0901, pruned_loss=0.01237, audio_tagging_loss=0.009024, over 3048115.29 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:42:54,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3125686.6666666665, ans=0.0 2023-11-27 15:43:02,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-27 15:43:16,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3125820.0, ans=0.0 2023-11-27 15:43:44,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-27 15:43:47,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=3125953.3333333335, ans=12.0 2023-11-27 15:43:48,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.695e+01 9.240e+01 9.952e+01 1.274e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:43:49,571 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 12000, loss[loss=0.06111, simple_loss=0.06934, pruned_loss=0.01385, audio_tagging_loss=0.01259, over 15001.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08967, pruned_loss=0.01259, audio_tagging_loss=0.009231, over 3051926.89 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2023-11-27 15:43:49,571 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 15:44:24,024 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05064, pruned_loss=0.005162, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-27 15:44:24,024 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 15:44:27,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3126020.0, ans=0.125 2023-11-27 15:45:06,133 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 0, loss[loss=0.07549, simple_loss=0.1002, pruned_loss=0.008111, audio_tagging_loss=0.01727, over 15514.00 frames. ], tot_loss[loss=0.07549, simple_loss=0.1002, pruned_loss=0.008111, audio_tagging_loss=0.01727, over 15514.00 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:45:06,133 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 15:45:41,195 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05772, simple_loss=0.0507, pruned_loss=0.005215, audio_tagging_loss=0.02715, over 4681554.00 frames. 2023-11-27 15:45:41,195 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 15:45:48,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3126186.6666666665, ans=0.125 2023-11-27 15:45:53,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-27 15:45:57,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3126253.3333333335, ans=0.0 2023-11-27 15:46:04,303 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-27 15:46:35,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3126453.3333333335, ans=0.2 2023-11-27 15:46:39,232 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 50, loss[loss=0.08341, simple_loss=0.1125, pruned_loss=0.0144, audio_tagging_loss=0.01278, over 15304.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.08827, pruned_loss=0.01207, audio_tagging_loss=0.01698, over 688931.47 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:46:46,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-27 15:46:53,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3126586.6666666665, ans=0.125 2023-11-27 15:46:54,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3126586.6666666665, ans=0.025 2023-11-27 15:47:01,918 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-27 15:47:06,530 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.209e+01 9.877e+01 1.086e+02 2.497e+02, threshold=1.975e+02, percent-clipped=1.0 2023-11-27 15:47:29,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3126786.6666666665, ans=0.0 2023-11-27 15:47:36,520 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 100, loss[loss=0.06303, simple_loss=0.0867, pruned_loss=0.008468, audio_tagging_loss=0.01121, over 15321.00 frames. ], tot_loss[loss=0.07382, simple_loss=0.09066, pruned_loss=0.0124, audio_tagging_loss=0.01609, over 1207560.39 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:47:37,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3126853.3333333335, ans=0.0 2023-11-27 15:47:45,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126853.3333333335, ans=0.1 2023-11-27 15:47:49,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3126920.0, ans=0.2 2023-11-27 15:48:00,214 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-27 15:48:05,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3126986.6666666665, ans=0.2 2023-11-27 15:48:10,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:14,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:28,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3127120.0, ans=0.125 2023-11-27 15:48:28,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3127120.0, ans=0.125 2023-11-27 15:48:34,270 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 150, loss[loss=0.06018, simple_loss=0.07813, pruned_loss=0.007693, audio_tagging_loss=0.01342, over 15609.00 frames. ], tot_loss[loss=0.07252, simple_loss=0.09082, pruned_loss=0.01256, audio_tagging_loss=0.01455, over 1619466.18 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:48:45,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-11-27 15:48:56,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-27 15:48:58,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-27 15:49:03,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 9.129e+01 9.870e+01 1.058e+02 1.571e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 15:49:23,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3127453.3333333335, ans=10.0 2023-11-27 15:49:32,935 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 200, loss[loss=0.06383, simple_loss=0.08485, pruned_loss=0.01232, audio_tagging_loss=0.009077, over 15355.00 frames. ], tot_loss[loss=0.07023, simple_loss=0.09004, pruned_loss=0.01231, audio_tagging_loss=0.0129, over 1933444.67 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:49:33,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3127520.0, ans=0.07 2023-11-27 15:49:48,518 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:49:55,014 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-27 15:49:55,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-27 15:50:06,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-27 15:50:29,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3127853.3333333335, ans=0.09899494936611666 2023-11-27 15:50:29,962 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 250, loss[loss=0.06653, simple_loss=0.09474, pruned_loss=0.01186, audio_tagging_loss=0.007304, over 16006.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.09013, pruned_loss=0.01257, audio_tagging_loss=0.01165, over 2176341.71 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:50:31,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3127853.3333333335, ans=0.0 2023-11-27 15:50:43,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3127920.0, ans=0.05 2023-11-27 15:50:45,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3127920.0, ans=0.0 2023-11-27 15:50:50,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3127920.0, ans=15.0 2023-11-27 15:50:51,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-27 15:50:53,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-27 15:50:59,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.939e+01 9.454e+01 1.026e+02 1.364e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 15:51:02,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-27 15:51:06,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128053.3333333335, ans=0.1 2023-11-27 15:51:16,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3128120.0, ans=0.125 2023-11-27 15:51:17,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3128120.0, ans=0.125 2023-11-27 15:51:19,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3128120.0, ans=0.04949747468305833 2023-11-27 15:51:26,658 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 300, loss[loss=0.05812, simple_loss=0.07598, pruned_loss=0.009201, audio_tagging_loss=0.01093, over 15354.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09045, pruned_loss=0.01262, audio_tagging_loss=0.01074, over 2377517.82 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:51:32,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3128186.6666666665, ans=0.0 2023-11-27 15:51:37,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-27 15:51:50,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-27 15:51:50,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.73 vs. limit=15.0 2023-11-27 15:52:01,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3128386.6666666665, ans=0.0 2023-11-27 15:52:18,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3128453.3333333335, ans=0.0 2023-11-27 15:52:24,500 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 350, loss[loss=0.06026, simple_loss=0.08709, pruned_loss=0.007999, audio_tagging_loss=0.008713, over 15528.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09105, pruned_loss=0.01261, audio_tagging_loss=0.01011, over 2527298.47 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:52:36,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-27 15:52:45,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3128653.3333333335, ans=0.2 2023-11-27 15:52:46,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-27 15:52:49,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3128653.3333333335, ans=0.0 2023-11-27 15:52:51,280 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:52:52,151 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.667e+01 9.273e+01 1.018e+02 1.811e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 15:53:02,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3128720.0, ans=0.125 2023-11-27 15:53:12,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-27 15:53:14,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3128786.6666666665, ans=0.2 2023-11-27 15:53:21,707 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 400, loss[loss=0.0449, simple_loss=0.05804, pruned_loss=0.004523, audio_tagging_loss=0.01136, over 15792.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09032, pruned_loss=0.01234, audio_tagging_loss=0.009805, over 2639281.98 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:53:22,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3128853.3333333335, ans=0.0 2023-11-27 15:53:25,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3128853.3333333335, ans=0.1 2023-11-27 15:53:34,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=22.5 2023-11-27 15:53:44,298 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-27 15:53:52,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2023-11-27 15:54:08,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-27 15:54:17,407 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 450, loss[loss=0.07136, simple_loss=0.08939, pruned_loss=0.01558, audio_tagging_loss=0.01109, over 15237.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08959, pruned_loss=0.01217, audio_tagging_loss=0.009569, over 2727311.31 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:54:20,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3129186.6666666665, ans=0.025 2023-11-27 15:54:26,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3129186.6666666665, ans=0.2 2023-11-27 15:54:27,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3129186.6666666665, ans=0.125 2023-11-27 15:54:33,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3129253.3333333335, ans=0.0 2023-11-27 15:54:41,482 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-27 15:54:48,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.585e+01 9.092e+01 1.003e+02 1.210e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-27 15:54:49,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129320.0, ans=0.125 2023-11-27 15:54:57,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3129386.6666666665, ans=0.07 2023-11-27 15:55:00,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3129386.6666666665, ans=0.0 2023-11-27 15:55:03,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3129453.3333333335, ans=0.2 2023-11-27 15:55:13,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3129453.3333333335, ans=0.025 2023-11-27 15:55:14,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129453.3333333335, ans=0.125 2023-11-27 15:55:16,567 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 500, loss[loss=0.07401, simple_loss=0.1009, pruned_loss=0.01384, audio_tagging_loss=0.009727, over 14556.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08913, pruned_loss=0.01209, audio_tagging_loss=0.009331, over 2795336.37 frames. ], batch size: 52, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:55:25,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3129520.0, ans=0.2 2023-11-27 15:55:35,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3129586.6666666665, ans=0.025 2023-11-27 15:55:39,472 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-27 15:56:04,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3129786.6666666665, ans=0.1 2023-11-27 15:56:14,250 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 550, loss[loss=0.06717, simple_loss=0.09728, pruned_loss=0.00901, audio_tagging_loss=0.00952, over 14635.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08872, pruned_loss=0.01205, audio_tagging_loss=0.0092, over 2849054.72 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:56:20,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2023-11-27 15:56:24,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3129920.0, ans=0.125 2023-11-27 15:56:28,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2023-11-27 15:56:31,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3129920.0, ans=0.2 2023-11-27 15:56:37,517 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-27 15:56:37,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3129986.6666666665, ans=0.2 2023-11-27 15:56:40,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3129986.6666666665, ans=0.125 2023-11-27 15:56:44,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.483e+01 9.154e+01 9.792e+01 1.177e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 15:56:55,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:55,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:58,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:58,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=22.5 2023-11-27 15:57:11,357 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 600, loss[loss=0.07046, simple_loss=0.1017, pruned_loss=0.01225, audio_tagging_loss=0.007334, over 15670.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08843, pruned_loss=0.0122, audio_tagging_loss=0.009231, over 2900111.90 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:57:35,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-27 15:57:41,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3130320.0, ans=0.125 2023-11-27 15:57:43,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3130320.0, ans=0.0 2023-11-27 15:57:47,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3130386.6666666665, ans=0.2 2023-11-27 15:57:48,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3130386.6666666665, ans=0.0 2023-11-27 15:57:55,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3130386.6666666665, ans=0.0 2023-11-27 15:58:02,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-27 15:58:09,178 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 650, loss[loss=0.06091, simple_loss=0.07253, pruned_loss=0.01203, audio_tagging_loss=0.01262, over 14828.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08772, pruned_loss=0.01212, audio_tagging_loss=0.009239, over 2932191.43 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:58:16,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-27 15:58:22,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3130586.6666666665, ans=0.0 2023-11-27 15:58:26,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3130586.6666666665, ans=0.2 2023-11-27 15:58:28,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-27 15:58:32,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-27 15:58:40,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.671e+01 9.149e+01 9.953e+01 1.299e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:59:00,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3130786.6666666665, ans=0.1 2023-11-27 15:59:07,566 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 700, loss[loss=0.09303, simple_loss=0.122, pruned_loss=0.02257, audio_tagging_loss=0.00948, over 16161.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08905, pruned_loss=0.01245, audio_tagging_loss=0.009184, over 2959532.10 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 15:59:13,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.64 vs. limit=15.0 2023-11-27 15:59:23,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-27 15:59:29,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-27 15:59:47,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3131053.3333333335, ans=0.0 2023-11-27 15:59:48,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-27 15:59:49,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3131053.3333333335, ans=0.125 2023-11-27 15:59:56,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=8.0 2023-11-27 15:59:57,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3131120.0, ans=0.125 2023-11-27 16:00:01,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3131120.0, ans=0.025 2023-11-27 16:00:05,237 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 750, loss[loss=0.07952, simple_loss=0.1093, pruned_loss=0.01692, audio_tagging_loss=0.007935, over 15763.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09021, pruned_loss=0.01236, audio_tagging_loss=0.009004, over 2983715.53 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 16:00:09,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3131186.6666666665, ans=0.5 2023-11-27 16:00:20,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3131253.3333333335, ans=0.1 2023-11-27 16:00:20,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3131253.3333333335, ans=0.2 2023-11-27 16:00:23,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3131253.3333333335, ans=0.125 2023-11-27 16:00:28,336 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-27 16:00:36,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.681e+01 9.396e+01 9.945e+01 1.193e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 16:01:03,220 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 800, loss[loss=0.05133, simple_loss=0.07517, pruned_loss=0.007045, audio_tagging_loss=0.006701, over 15116.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09008, pruned_loss=0.01237, audio_tagging_loss=0.00913, over 3001120.57 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:01:04,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-27 16:01:24,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3131586.6666666665, ans=0.125 2023-11-27 16:01:26,304 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-27 16:01:29,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3131653.3333333335, ans=0.125 2023-11-27 16:01:31,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131653.3333333335, ans=0.1 2023-11-27 16:01:38,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3131720.0, ans=0.0 2023-11-27 16:01:40,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-27 16:01:53,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3131786.6666666665, ans=0.2 2023-11-27 16:01:55,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3131786.6666666665, ans=0.0 2023-11-27 16:02:00,638 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 850, loss[loss=0.08781, simple_loss=0.1288, pruned_loss=0.0143, audio_tagging_loss=0.009104, over 16415.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09092, pruned_loss=0.01245, audio_tagging_loss=0.009188, over 3015021.82 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:02:10,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3131920.0, ans=0.0 2023-11-27 16:02:22,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-27 16:02:23,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2023-11-27 16:02:31,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.724e+01 9.421e+01 1.007e+02 1.369e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 16:02:44,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3132053.3333333335, ans=0.025 2023-11-27 16:02:57,919 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 900, loss[loss=0.06329, simple_loss=0.08595, pruned_loss=0.01328, audio_tagging_loss=0.007033, over 15159.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09026, pruned_loss=0.01232, audio_tagging_loss=0.009235, over 3028756.36 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:03:03,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-27 16:03:20,883 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-27 16:03:26,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3132320.0, ans=0.0 2023-11-27 16:03:42,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3132386.6666666665, ans=0.125 2023-11-27 16:03:55,284 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 950, loss[loss=0.04831, simple_loss=0.06264, pruned_loss=0.005874, audio_tagging_loss=0.01111, over 15033.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09112, pruned_loss=0.01236, audio_tagging_loss=0.009032, over 3027766.48 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:04:12,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3132586.6666666665, ans=0.5 2023-11-27 16:04:19,227 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-27 16:04:26,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.703e+01 9.559e+01 1.057e+02 1.419e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 16:04:39,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3132720.0, ans=0.125 2023-11-27 16:04:39,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3132720.0, ans=0.0 2023-11-27 16:04:53,152 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1000, loss[loss=0.05194, simple_loss=0.07017, pruned_loss=0.005711, audio_tagging_loss=0.01114, over 15951.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09094, pruned_loss=0.01257, audio_tagging_loss=0.008906, over 3027296.92 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:05:16,389 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-27 16:05:19,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3132986.6666666665, ans=0.125 2023-11-27 16:05:20,826 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:05:29,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3133053.3333333335, ans=0.125 2023-11-27 16:05:38,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3133120.0, ans=0.05 2023-11-27 16:05:40,233 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:05:44,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3133120.0, ans=0.125 2023-11-27 16:05:51,390 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1050, loss[loss=0.06169, simple_loss=0.08196, pruned_loss=0.009749, audio_tagging_loss=0.01097, over 15415.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09036, pruned_loss=0.01262, audio_tagging_loss=0.008867, over 3027844.86 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:06:06,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3133253.3333333335, ans=0.1 2023-11-27 16:06:07,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-27 16:06:09,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3133253.3333333335, ans=0.0 2023-11-27 16:06:14,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-27 16:06:16,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3133320.0, ans=0.95 2023-11-27 16:06:22,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.934e+01 9.738e+01 1.038e+02 1.396e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 16:06:34,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3133386.6666666665, ans=0.125 2023-11-27 16:06:41,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-27 16:06:41,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2023-11-27 16:06:48,706 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1100, loss[loss=0.06358, simple_loss=0.08648, pruned_loss=0.01142, audio_tagging_loss=0.008921, over 14452.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08939, pruned_loss=0.01241, audio_tagging_loss=0.008803, over 3029654.49 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:06:54,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133520.0, ans=0.1 2023-11-27 16:06:55,069 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:06:58,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3133520.0, ans=0.05 2023-11-27 16:07:12,227 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-27 16:07:39,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3133786.6666666665, ans=0.0 2023-11-27 16:07:46,796 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1150, loss[loss=0.06508, simple_loss=0.08987, pruned_loss=0.0134, audio_tagging_loss=0.006752, over 14422.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08888, pruned_loss=0.01247, audio_tagging_loss=0.008778, over 3029617.62 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:07:59,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3133920.0, ans=0.125 2023-11-27 16:08:00,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3133920.0, ans=0.2 2023-11-27 16:08:10,130 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-27 16:08:16,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3133986.6666666665, ans=0.2 2023-11-27 16:08:16,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3133986.6666666665, ans=0.2 2023-11-27 16:08:18,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.606e+01 9.243e+01 9.874e+01 1.339e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 16:08:23,154 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:08:25,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3134053.3333333335, ans=0.025 2023-11-27 16:08:37,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2023-11-27 16:08:43,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3134186.6666666665, ans=0.125 2023-11-27 16:08:44,537 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1200, loss[loss=0.07854, simple_loss=0.1045, pruned_loss=0.01766, audio_tagging_loss=0.008641, over 15123.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08986, pruned_loss=0.01292, audio_tagging_loss=0.008702, over 3024350.26 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:09:04,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3134253.3333333335, ans=0.125 2023-11-27 16:09:08,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-27 16:09:28,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3134386.6666666665, ans=0.2 2023-11-27 16:09:35,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-27 16:09:36,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3134453.3333333335, ans=0.1 2023-11-27 16:09:42,467 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1250, loss[loss=0.06226, simple_loss=0.08772, pruned_loss=0.009541, audio_tagging_loss=0.008856, over 15299.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09009, pruned_loss=0.01284, audio_tagging_loss=0.008591, over 3030320.18 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:09:51,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3134520.0, ans=0.1 2023-11-27 16:09:52,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3134520.0, ans=0.125 2023-11-27 16:10:03,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3134586.6666666665, ans=0.1 2023-11-27 16:10:05,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3134653.3333333335, ans=0.0 2023-11-27 16:10:06,001 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-27 16:10:06,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-27 16:10:11,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-27 16:10:14,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.573e+01 9.266e+01 9.865e+01 1.338e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 16:10:17,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3134720.0, ans=0.0 2023-11-27 16:10:40,913 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1300, loss[loss=0.07633, simple_loss=0.1007, pruned_loss=0.01345, audio_tagging_loss=0.01255, over 14073.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08979, pruned_loss=0.01269, audio_tagging_loss=0.008748, over 3028336.85 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:11:03,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-27 16:11:14,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3135053.3333333335, ans=0.0 2023-11-27 16:11:28,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3135120.0, ans=0.1 2023-11-27 16:11:29,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3135120.0, ans=0.2 2023-11-27 16:11:38,495 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1350, loss[loss=0.0856, simple_loss=0.1184, pruned_loss=0.01804, audio_tagging_loss=0.008369, over 15521.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08907, pruned_loss=0.01269, audio_tagging_loss=0.008788, over 3027844.07 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:01,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-27 16:12:09,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.641e+01 9.240e+01 9.975e+01 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:12:23,876 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:12:24,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3135453.3333333335, ans=10.0 2023-11-27 16:12:29,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3135453.3333333335, ans=0.0 2023-11-27 16:12:33,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-27 16:12:33,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3135453.3333333335, ans=0.0 2023-11-27 16:12:36,592 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1400, loss[loss=0.07482, simple_loss=0.09494, pruned_loss=0.01671, audio_tagging_loss=0.01064, over 15321.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08888, pruned_loss=0.01267, audio_tagging_loss=0.008806, over 3032922.97 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:37,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-11-27 16:12:43,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3135520.0, ans=0.1 2023-11-27 16:12:57,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3135586.6666666665, ans=10.0 2023-11-27 16:12:59,808 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-27 16:13:09,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2023-11-27 16:13:16,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-27 16:13:17,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3135720.0, ans=0.125 2023-11-27 16:13:28,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3135786.6666666665, ans=0.125 2023-11-27 16:13:34,926 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1450, loss[loss=0.0814, simple_loss=0.1069, pruned_loss=0.01942, audio_tagging_loss=0.008552, over 15280.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08941, pruned_loss=0.01275, audio_tagging_loss=0.008765, over 3040579.63 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:13:44,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3135853.3333333335, ans=0.125 2023-11-27 16:13:52,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:53,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3135920.0, ans=0.09899494936611666 2023-11-27 16:13:53,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3135920.0, ans=0.125 2023-11-27 16:13:57,714 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-27 16:14:00,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3135986.6666666665, ans=0.5 2023-11-27 16:14:05,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.750e+01 9.513e+01 1.013e+02 1.289e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 16:14:11,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3136053.3333333335, ans=0.125 2023-11-27 16:14:14,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-27 16:14:28,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3136120.0, ans=0.125 2023-11-27 16:14:29,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3136120.0, ans=0.125 2023-11-27 16:14:30,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3136120.0, ans=0.5 2023-11-27 16:14:32,815 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1500, loss[loss=0.05155, simple_loss=0.05959, pruned_loss=0.0114, audio_tagging_loss=0.01035, over 14171.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08995, pruned_loss=0.01278, audio_tagging_loss=0.008797, over 3043126.35 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:14:48,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3136253.3333333335, ans=0.125 2023-11-27 16:14:56,064 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-27 16:15:02,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3136320.0, ans=0.0 2023-11-27 16:15:07,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-27 16:15:08,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136386.6666666665, ans=0.1 2023-11-27 16:15:18,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3136453.3333333335, ans=0.0 2023-11-27 16:15:20,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3136453.3333333335, ans=0.0 2023-11-27 16:15:30,252 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1550, loss[loss=0.06274, simple_loss=0.07823, pruned_loss=0.01245, audio_tagging_loss=0.01118, over 17121.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09079, pruned_loss=0.01293, audio_tagging_loss=0.008909, over 3036549.46 frames. ], batch size: 66, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:15:38,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3136520.0, ans=0.0 2023-11-27 16:15:40,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3136520.0, ans=0.07 2023-11-27 16:15:40,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3136520.0, ans=0.2 2023-11-27 16:15:53,754 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-27 16:16:03,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.701e+01 9.341e+01 1.017e+02 1.377e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:16:03,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3136653.3333333335, ans=0.05 2023-11-27 16:16:08,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3136720.0, ans=0.0 2023-11-27 16:16:11,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3136720.0, ans=0.0 2023-11-27 16:16:20,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=22.5 2023-11-27 16:16:28,063 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1600, loss[loss=0.07454, simple_loss=0.1104, pruned_loss=0.01041, audio_tagging_loss=0.008928, over 16452.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09093, pruned_loss=0.01307, audio_tagging_loss=0.008911, over 3045046.48 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:16:28,225 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:16:31,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3136853.3333333335, ans=0.0 2023-11-27 16:16:33,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3136853.3333333335, ans=0.125 2023-11-27 16:16:50,907 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-27 16:16:56,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3136986.6666666665, ans=0.0 2023-11-27 16:17:00,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-27 16:17:07,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3137053.3333333335, ans=0.125 2023-11-27 16:17:20,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3137120.0, ans=0.0 2023-11-27 16:17:26,218 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1650, loss[loss=0.07217, simple_loss=0.0962, pruned_loss=0.01432, audio_tagging_loss=0.009753, over 14994.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09103, pruned_loss=0.01304, audio_tagging_loss=0.00897, over 3048524.73 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:17:28,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3137186.6666666665, ans=0.0 2023-11-27 16:17:28,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3137186.6666666665, ans=0.125 2023-11-27 16:17:35,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-27 16:17:48,368 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-27 16:17:58,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.945e+01 9.413e+01 1.026e+02 1.249e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 16:17:59,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3137320.0, ans=0.0 2023-11-27 16:18:02,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2023-11-27 16:18:14,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3137453.3333333335, ans=0.125 2023-11-27 16:18:23,907 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1700, loss[loss=0.06785, simple_loss=0.09319, pruned_loss=0.01178, audio_tagging_loss=0.009473, over 15409.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09063, pruned_loss=0.01295, audio_tagging_loss=0.00895, over 3055773.46 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:18:31,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3137520.0, ans=0.0 2023-11-27 16:18:34,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3137586.6666666665, ans=0.0 2023-11-27 16:18:38,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3137586.6666666665, ans=0.0 2023-11-27 16:18:40,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-11-27 16:18:43,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-27 16:18:47,171 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-27 16:18:51,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137653.3333333335, ans=0.1 2023-11-27 16:18:58,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-27 16:19:21,610 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1750, loss[loss=0.05093, simple_loss=0.06713, pruned_loss=0.008592, audio_tagging_loss=0.008773, over 15302.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.0903, pruned_loss=0.01291, audio_tagging_loss=0.008958, over 3049340.87 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:19:30,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=15.0 2023-11-27 16:19:32,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137920.0, ans=0.125 2023-11-27 16:19:41,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-27 16:19:42,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-27 16:19:44,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-27 16:19:52,542 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:19:54,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.663e+01 9.086e+01 9.649e+01 1.198e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 16:19:59,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3138053.3333333335, ans=0.125 2023-11-27 16:20:19,418 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1800, loss[loss=0.07893, simple_loss=0.1138, pruned_loss=0.015, audio_tagging_loss=0.007044, over 14711.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08975, pruned_loss=0.0127, audio_tagging_loss=0.008981, over 3056449.96 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:20:21,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-27 16:20:33,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138253.3333333335, ans=0.1 2023-11-27 16:20:41,854 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-27 16:20:47,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-27 16:20:57,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=12.0 2023-11-27 16:21:07,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3138453.3333333335, ans=0.1 2023-11-27 16:21:16,679 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1850, loss[loss=0.09042, simple_loss=0.1334, pruned_loss=0.01924, audio_tagging_loss=0.004501, over 16424.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09039, pruned_loss=0.01272, audio_tagging_loss=0.008873, over 3053132.85 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:21:16,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138520.0, ans=0.1 2023-11-27 16:21:21,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138520.0, ans=0.125 2023-11-27 16:21:40,193 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-27 16:21:44,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138653.3333333335, ans=0.125 2023-11-27 16:21:50,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.620e+01 9.187e+01 9.919e+01 1.245e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 16:21:53,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3138720.0, ans=0.2 2023-11-27 16:22:14,809 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1900, loss[loss=0.05254, simple_loss=0.07083, pruned_loss=0.00669, audio_tagging_loss=0.01044, over 16023.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08887, pruned_loss=0.01251, audio_tagging_loss=0.008808, over 3057326.80 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:22:38,557 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-27 16:23:00,707 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-27 16:23:12,710 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1950, loss[loss=0.06879, simple_loss=0.09916, pruned_loss=0.01396, audio_tagging_loss=0.005246, over 14685.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08886, pruned_loss=0.01254, audio_tagging_loss=0.008787, over 3061751.22 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:23:32,533 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:23:34,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3139320.0, ans=0.0 2023-11-27 16:23:35,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-27 16:23:45,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3139320.0, ans=0.125 2023-11-27 16:23:46,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.576e+01 9.160e+01 9.774e+01 1.352e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 16:23:47,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-27 16:23:51,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3139386.6666666665, ans=0.0 2023-11-27 16:23:51,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-27 16:24:08,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-11-27 16:24:10,836 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2000, loss[loss=0.08049, simple_loss=0.1165, pruned_loss=0.01377, audio_tagging_loss=0.008476, over 14126.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08938, pruned_loss=0.01265, audio_tagging_loss=0.008787, over 3054415.28 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:24:18,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3139520.0, ans=0.0 2023-11-27 16:24:33,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-27 16:24:49,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3139720.0, ans=0.125 2023-11-27 16:24:56,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2023-11-27 16:24:58,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3139786.6666666665, ans=0.125 2023-11-27 16:25:03,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139786.6666666665, ans=0.1 2023-11-27 16:25:07,720 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2050, loss[loss=0.05822, simple_loss=0.08494, pruned_loss=0.008191, audio_tagging_loss=0.00756, over 15611.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08926, pruned_loss=0.01253, audio_tagging_loss=0.008775, over 3055572.17 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:25:20,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-27 16:25:31,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-27 16:25:39,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3139986.6666666665, ans=0.0 2023-11-27 16:25:41,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.831e+01 9.330e+01 1.029e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 16:25:42,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:25:44,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:25:46,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:26:05,827 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2100, loss[loss=0.07742, simple_loss=0.1051, pruned_loss=0.01687, audio_tagging_loss=0.008009, over 15235.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08978, pruned_loss=0.01254, audio_tagging_loss=0.008705, over 3049227.94 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:26:07,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-27 16:26:13,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3140186.6666666665, ans=0.0 2023-11-27 16:26:17,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3140253.3333333335, ans=0.0 2023-11-27 16:26:18,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3140253.3333333335, ans=0.05 2023-11-27 16:26:29,101 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-27 16:26:58,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3140453.3333333335, ans=0.125 2023-11-27 16:27:03,645 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2150, loss[loss=0.06367, simple_loss=0.08346, pruned_loss=0.01326, audio_tagging_loss=0.008675, over 13766.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08981, pruned_loss=0.01255, audio_tagging_loss=0.008723, over 3044730.92 frames. ], batch size: 51, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:27:22,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3140586.6666666665, ans=0.125 2023-11-27 16:27:27,017 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-27 16:27:36,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.778e+01 9.339e+01 1.008e+02 1.312e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:27:42,447 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:27:44,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3140720.0, ans=0.1 2023-11-27 16:27:48,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=12.0 2023-11-27 16:27:49,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3140786.6666666665, ans=0.0 2023-11-27 16:28:01,142 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2200, loss[loss=0.05865, simple_loss=0.08144, pruned_loss=0.007345, audio_tagging_loss=0.01059, over 14326.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08972, pruned_loss=0.01246, audio_tagging_loss=0.008739, over 3040947.41 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:28:24,798 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-27 16:28:36,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3141053.3333333335, ans=0.0 2023-11-27 16:28:48,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:54,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:57,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3141186.6666666665, ans=0.125 2023-11-27 16:28:58,791 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2250, loss[loss=0.07652, simple_loss=0.1092, pruned_loss=0.01603, audio_tagging_loss=0.005896, over 15122.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.0907, pruned_loss=0.01267, audio_tagging_loss=0.008714, over 3043467.29 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:29:16,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3141253.3333333335, ans=0.125 2023-11-27 16:29:21,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-27 16:29:30,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2023-11-27 16:29:33,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.710e+01 9.342e+01 1.003e+02 1.212e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:29:34,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-27 16:29:38,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3141386.6666666665, ans=0.125 2023-11-27 16:29:44,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3141453.3333333335, ans=0.0 2023-11-27 16:29:47,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-27 16:29:52,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141453.3333333335, ans=0.1 2023-11-27 16:29:52,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141453.3333333335, ans=0.1 2023-11-27 16:29:57,615 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2300, loss[loss=0.07585, simple_loss=0.1104, pruned_loss=0.01437, audio_tagging_loss=0.006296, over 16011.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08995, pruned_loss=0.01252, audio_tagging_loss=0.00878, over 3045841.84 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:30:05,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2023-11-27 16:30:20,336 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-27 16:30:33,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3141720.0, ans=0.125 2023-11-27 16:30:43,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3141786.6666666665, ans=0.1 2023-11-27 16:30:45,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-27 16:30:51,096 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:30:52,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141786.6666666665, ans=0.1 2023-11-27 16:30:54,403 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2350, loss[loss=0.08331, simple_loss=0.1067, pruned_loss=0.01861, audio_tagging_loss=0.01137, over 13787.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09086, pruned_loss=0.01255, audio_tagging_loss=0.008848, over 3051639.28 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:31:12,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2023-11-27 16:31:15,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3141920.0, ans=0.0 2023-11-27 16:31:16,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-27 16:31:18,072 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-27 16:31:24,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3141986.6666666665, ans=0.0 2023-11-27 16:31:29,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.596e+01 9.412e+01 1.004e+02 1.275e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 16:31:36,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142053.3333333335, ans=0.1 2023-11-27 16:31:37,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-27 16:31:49,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3142120.0, ans=0.125 2023-11-27 16:31:52,505 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2400, loss[loss=0.07916, simple_loss=0.1057, pruned_loss=0.01624, audio_tagging_loss=0.01009, over 16193.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09082, pruned_loss=0.01272, audio_tagging_loss=0.009047, over 3054274.43 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:02,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142186.6666666665, ans=0.1 2023-11-27 16:32:07,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3142253.3333333335, ans=0.0 2023-11-27 16:32:15,876 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-27 16:32:26,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142386.6666666665, ans=0.1 2023-11-27 16:32:39,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142453.3333333335, ans=0.1 2023-11-27 16:32:47,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.35 vs. limit=5.0 2023-11-27 16:32:49,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3142520.0, ans=0.125 2023-11-27 16:32:50,643 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2450, loss[loss=0.06545, simple_loss=0.09127, pruned_loss=0.01203, audio_tagging_loss=0.00779, over 15511.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09076, pruned_loss=0.01278, audio_tagging_loss=0.009082, over 3055676.41 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:56,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3142520.0, ans=0.125 2023-11-27 16:33:03,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3142586.6666666665, ans=0.0 2023-11-27 16:33:13,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-27 16:33:25,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.627e+01 9.278e+01 9.943e+01 1.246e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 16:33:30,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3142720.0, ans=0.1 2023-11-27 16:33:35,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3142720.0, ans=0.0 2023-11-27 16:33:36,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3142786.6666666665, ans=0.2 2023-11-27 16:33:40,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142786.6666666665, ans=0.1 2023-11-27 16:33:41,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3142786.6666666665, ans=0.0 2023-11-27 16:33:44,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3142786.6666666665, ans=0.0 2023-11-27 16:33:48,568 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2500, loss[loss=0.06175, simple_loss=0.08574, pruned_loss=0.008527, audio_tagging_loss=0.01035, over 14674.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09034, pruned_loss=0.0127, audio_tagging_loss=0.009078, over 3055526.25 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:34:10,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3142986.6666666665, ans=0.0 2023-11-27 16:34:11,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-27 16:34:27,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3143053.3333333335, ans=0.0 2023-11-27 16:34:28,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3143053.3333333335, ans=0.0 2023-11-27 16:34:31,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2023-11-27 16:34:37,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3143120.0, ans=0.125 2023-11-27 16:34:44,599 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:34:44,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-27 16:34:46,506 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2550, loss[loss=0.06562, simple_loss=0.08729, pruned_loss=0.01103, audio_tagging_loss=0.01095, over 15818.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09039, pruned_loss=0.01276, audio_tagging_loss=0.008989, over 3053420.76 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:35:09,322 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-27 16:35:21,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.534e+01 9.124e+01 9.895e+01 1.510e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-27 16:35:44,624 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2600, loss[loss=0.07214, simple_loss=0.0974, pruned_loss=0.01538, audio_tagging_loss=0.008059, over 16727.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09038, pruned_loss=0.01274, audio_tagging_loss=0.008777, over 3054975.39 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:35:51,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3143520.0, ans=0.125 2023-11-27 16:36:07,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-27 16:36:38,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3143786.6666666665, ans=0.125 2023-11-27 16:36:41,776 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2650, loss[loss=0.05734, simple_loss=0.08425, pruned_loss=0.009093, audio_tagging_loss=0.006122, over 14393.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09029, pruned_loss=0.01276, audio_tagging_loss=0.008782, over 3053102.65 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:36:44,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2023-11-27 16:36:47,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-11-27 16:37:05,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-27 16:37:08,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3143986.6666666665, ans=0.125 2023-11-27 16:37:18,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.789e+01 9.225e+01 1.011e+02 1.898e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 16:37:41,107 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2700, loss[loss=0.06562, simple_loss=0.09658, pruned_loss=0.009795, audio_tagging_loss=0.007541, over 15102.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09046, pruned_loss=0.01269, audio_tagging_loss=0.0087, over 3055602.52 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:37:42,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3144186.6666666665, ans=0.125 2023-11-27 16:37:44,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2023-11-27 16:37:46,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3144186.6666666665, ans=0.0 2023-11-27 16:37:50,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=8.0 2023-11-27 16:37:57,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-27 16:37:57,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3144253.3333333335, ans=0.09899494936611666 2023-11-27 16:38:02,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3144320.0, ans=0.0 2023-11-27 16:38:03,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-27 16:38:25,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3144453.3333333335, ans=0.95 2023-11-27 16:38:38,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2023-11-27 16:38:38,649 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2750, loss[loss=0.0959, simple_loss=0.1357, pruned_loss=0.02285, audio_tagging_loss=0.005182, over 15309.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08948, pruned_loss=0.01268, audio_tagging_loss=0.008749, over 3059165.54 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:39:00,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-27 16:39:03,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-11-27 16:39:14,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.654e+01 9.307e+01 9.925e+01 1.318e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 16:39:31,246 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:39:35,721 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2800, loss[loss=0.06365, simple_loss=0.09062, pruned_loss=0.01091, audio_tagging_loss=0.007424, over 15526.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08796, pruned_loss=0.01233, audio_tagging_loss=0.008702, over 3055001.36 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:39:54,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3144920.0, ans=10.0 2023-11-27 16:39:59,057 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-27 16:40:06,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3144986.6666666665, ans=0.125 2023-11-27 16:40:33,062 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2850, loss[loss=0.08214, simple_loss=0.1095, pruned_loss=0.01929, audio_tagging_loss=0.008087, over 15082.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08936, pruned_loss=0.01279, audio_tagging_loss=0.008665, over 3054149.99 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:40:39,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3145186.6666666665, ans=10.0 2023-11-27 16:40:44,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3145253.3333333335, ans=0.1 2023-11-27 16:40:46,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2023-11-27 16:40:56,797 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-27 16:41:02,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=22.5 2023-11-27 16:41:10,170 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.689e+01 9.342e+01 1.021e+02 1.296e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:41:12,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3145386.6666666665, ans=0.125 2023-11-27 16:41:27,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2023-11-27 16:41:31,281 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2900, loss[loss=0.06453, simple_loss=0.08707, pruned_loss=0.01102, audio_tagging_loss=0.009969, over 14247.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08924, pruned_loss=0.01281, audio_tagging_loss=0.008668, over 3049248.03 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:41:35,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3145520.0, ans=0.125 2023-11-27 16:41:48,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3145586.6666666665, ans=0.0 2023-11-27 16:41:54,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-27 16:41:56,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-27 16:41:56,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3145653.3333333335, ans=0.0 2023-11-27 16:42:01,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-27 16:42:08,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3145720.0, ans=0.0 2023-11-27 16:42:08,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3145720.0, ans=0.0 2023-11-27 16:42:19,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-27 16:42:28,696 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2950, loss[loss=0.04882, simple_loss=0.07128, pruned_loss=0.006733, audio_tagging_loss=0.006451, over 14644.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08933, pruned_loss=0.01263, audio_tagging_loss=0.008775, over 3051706.69 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:42:30,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145853.3333333335, ans=0.125 2023-11-27 16:42:38,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2023-11-27 16:42:39,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-27 16:42:52,179 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-27 16:42:54,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-27 16:43:05,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.644e+01 9.313e+01 9.896e+01 1.371e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 16:43:21,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3146120.0, ans=0.0 2023-11-27 16:43:25,553 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3000, loss[loss=0.06889, simple_loss=0.09097, pruned_loss=0.01255, audio_tagging_loss=0.01085, over 15653.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08931, pruned_loss=0.01252, audio_tagging_loss=0.0089, over 3055212.82 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:43:25,553 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 16:43:50,459 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5156, 3.4782, 3.8161, 3.6465], device='cuda:3') 2023-11-27 16:44:00,560 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.0576, simple_loss=0.0507, pruned_loss=0.005183, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 16:44:00,561 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 16:44:00,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3146186.6666666665, ans=0.07 2023-11-27 16:44:00,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3146186.6666666665, ans=0.1 2023-11-27 16:44:22,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-27 16:44:24,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-11-27 16:44:57,626 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3050, loss[loss=0.07592, simple_loss=0.1178, pruned_loss=0.01008, audio_tagging_loss=0.006939, over 15752.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08948, pruned_loss=0.01241, audio_tagging_loss=0.008893, over 3048557.60 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:44:57,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3146520.0, ans=0.125 2023-11-27 16:45:01,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2023-11-27 16:45:13,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3146586.6666666665, ans=0.125 2023-11-27 16:45:20,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-27 16:45:37,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.960e+01 9.776e+01 1.063e+02 1.311e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 16:45:37,479 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:45:48,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3146786.6666666665, ans=10.0 2023-11-27 16:45:49,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3146786.6666666665, ans=0.1 2023-11-27 16:45:52,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=15.0 2023-11-27 16:45:57,821 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3100, loss[loss=0.07723, simple_loss=0.1149, pruned_loss=0.01289, audio_tagging_loss=0.006909, over 16372.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09013, pruned_loss=0.01261, audio_tagging_loss=0.008882, over 3045638.69 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:46:05,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-27 16:46:07,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-27 16:46:18,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3146920.0, ans=0.125 2023-11-27 16:46:21,469 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-27 16:46:21,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3146986.6666666665, ans=0.0 2023-11-27 16:46:27,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=22.5 2023-11-27 16:46:30,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3146986.6666666665, ans=0.0 2023-11-27 16:46:41,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3147053.3333333335, ans=0.1 2023-11-27 16:46:41,816 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:46:55,365 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3150, loss[loss=0.1008, simple_loss=0.1356, pruned_loss=0.02407, audio_tagging_loss=0.00899, over 14792.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09087, pruned_loss=0.01267, audio_tagging_loss=0.008902, over 3042942.29 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:47:00,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-27 16:47:18,649 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-27 16:47:32,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.709e+01 9.238e+01 9.861e+01 1.387e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:47:32,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-27 16:47:36,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3147386.6666666665, ans=0.125 2023-11-27 16:47:38,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-27 16:47:40,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147453.3333333335, ans=0.1 2023-11-27 16:47:52,700 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3200, loss[loss=0.06762, simple_loss=0.08797, pruned_loss=0.01354, audio_tagging_loss=0.0101, over 16229.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09016, pruned_loss=0.01254, audio_tagging_loss=0.008941, over 3040834.35 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:47:58,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3147520.0, ans=0.2 2023-11-27 16:48:03,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3147586.6666666665, ans=0.125 2023-11-27 16:48:15,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-27 16:48:36,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3147720.0, ans=0.125 2023-11-27 16:48:39,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-11-27 16:48:41,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-11-27 16:48:50,356 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3250, loss[loss=0.07848, simple_loss=0.09888, pruned_loss=0.02035, audio_tagging_loss=0.008689, over 15278.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09002, pruned_loss=0.01256, audio_tagging_loss=0.009022, over 3044290.50 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:48:54,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3147853.3333333335, ans=0.0 2023-11-27 16:48:56,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3147853.3333333335, ans=0.125 2023-11-27 16:49:02,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3147920.0, ans=0.2 2023-11-27 16:49:14,111 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-27 16:49:27,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-27 16:49:28,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.717e+01 9.369e+01 9.960e+01 1.192e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 16:49:48,406 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3300, loss[loss=0.07466, simple_loss=0.09684, pruned_loss=0.01713, audio_tagging_loss=0.009113, over 15573.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09, pruned_loss=0.01266, audio_tagging_loss=0.009117, over 3049486.26 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:49:49,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148186.6666666665, ans=0.1 2023-11-27 16:49:52,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3148186.6666666665, ans=0.125 2023-11-27 16:50:11,714 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-27 16:50:17,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3148320.0, ans=0.125 2023-11-27 16:50:30,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3148386.6666666665, ans=0.125 2023-11-27 16:50:39,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148453.3333333335, ans=0.1 2023-11-27 16:50:46,540 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3350, loss[loss=0.06473, simple_loss=0.08098, pruned_loss=0.01475, audio_tagging_loss=0.009491, over 14039.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08962, pruned_loss=0.01259, audio_tagging_loss=0.008991, over 3052750.30 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:50:49,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-11-27 16:51:01,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148586.6666666665, ans=0.1 2023-11-27 16:51:09,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-27 16:51:18,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3148653.3333333335, ans=0.0 2023-11-27 16:51:24,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.635e+01 9.292e+01 9.708e+01 1.105e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 16:51:43,854 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3400, loss[loss=0.04858, simple_loss=0.06694, pruned_loss=0.007801, audio_tagging_loss=0.00731, over 14835.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09009, pruned_loss=0.01257, audio_tagging_loss=0.008861, over 3052165.56 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:51:48,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2023-11-27 16:51:54,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3148920.0, ans=0.0 2023-11-27 16:52:07,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-27 16:52:32,716 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:52:41,857 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3450, loss[loss=0.05032, simple_loss=0.06107, pruned_loss=0.00898, audio_tagging_loss=0.0108, over 14117.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09078, pruned_loss=0.01262, audio_tagging_loss=0.008779, over 3048933.33 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:53:05,244 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-27 16:53:17,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2023-11-27 16:53:18,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3149386.6666666665, ans=0.0 2023-11-27 16:53:20,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.772e+01 9.450e+01 1.013e+02 1.492e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 16:53:39,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3149520.0, ans=0.2 2023-11-27 16:53:39,855 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3500, loss[loss=0.05875, simple_loss=0.07795, pruned_loss=0.01274, audio_tagging_loss=0.007033, over 16342.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09066, pruned_loss=0.01272, audio_tagging_loss=0.008709, over 3049612.42 frames. ], batch size: 64, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:53:43,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3149520.0, ans=0.0 2023-11-27 16:54:03,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2023-11-27 16:54:03,444 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-27 16:54:13,257 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:54:37,432 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3550, loss[loss=0.04788, simple_loss=0.05883, pruned_loss=0.008019, audio_tagging_loss=0.01045, over 17259.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09041, pruned_loss=0.01266, audio_tagging_loss=0.008744, over 3052946.87 frames. ], batch size: 68, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:55:00,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-27 16:55:06,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3149986.6666666665, ans=0.125 2023-11-27 16:55:07,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3149986.6666666665, ans=0.125 2023-11-27 16:55:15,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.604e+01 9.006e+01 9.735e+01 1.232e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-27 16:55:18,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-27 16:55:24,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.92 vs. limit=15.0 2023-11-27 16:55:35,491 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3600, loss[loss=0.08482, simple_loss=0.1156, pruned_loss=0.01927, audio_tagging_loss=0.007739, over 15606.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08988, pruned_loss=0.01262, audio_tagging_loss=0.008753, over 3049356.48 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:55:53,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150253.3333333335, ans=0.1 2023-11-27 16:55:58,143 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-27 16:56:04,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3150320.0, ans=0.125 2023-11-27 16:56:04,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3150320.0, ans=0.2 2023-11-27 16:56:08,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-27 16:56:20,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3150453.3333333335, ans=0.1 2023-11-27 16:56:21,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3150453.3333333335, ans=0.04949747468305833 2023-11-27 16:56:28,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-11-27 16:56:30,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3150453.3333333335, ans=0.125 2023-11-27 16:56:33,353 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3650, loss[loss=0.07793, simple_loss=0.1099, pruned_loss=0.01367, audio_tagging_loss=0.009292, over 15246.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09003, pruned_loss=0.01264, audio_tagging_loss=0.00872, over 3045079.19 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:56:56,724 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-27 16:56:58,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-11-27 16:57:11,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.799e+01 9.366e+01 9.854e+01 1.150e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 16:57:30,576 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3700, loss[loss=0.06374, simple_loss=0.07527, pruned_loss=0.01335, audio_tagging_loss=0.01275, over 15067.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09152, pruned_loss=0.01284, audio_tagging_loss=0.00865, over 3053878.12 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:57:32,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3150853.3333333335, ans=0.09899494936611666 2023-11-27 16:57:34,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=22.5 2023-11-27 16:57:38,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3150853.3333333335, ans=0.05 2023-11-27 16:57:40,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3150853.3333333335, ans=0.125 2023-11-27 16:57:42,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-11-27 16:57:52,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3150986.6666666665, ans=0.2 2023-11-27 16:57:53,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-27 16:58:07,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-27 16:58:19,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3151120.0, ans=0.125 2023-11-27 16:58:19,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2023-11-27 16:58:28,641 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3750, loss[loss=0.0492, simple_loss=0.06771, pruned_loss=0.005534, audio_tagging_loss=0.009814, over 14756.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09225, pruned_loss=0.013, audio_tagging_loss=0.00872, over 3050622.56 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:58:42,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3151253.3333333335, ans=0.125 2023-11-27 16:58:51,246 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-27 16:59:04,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3151386.6666666665, ans=0.125 2023-11-27 16:59:07,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.997e+01 9.607e+01 1.030e+02 1.522e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 16:59:12,467 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:59:26,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2023-11-27 16:59:26,481 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3800, loss[loss=0.05565, simple_loss=0.07206, pruned_loss=0.009198, audio_tagging_loss=0.01042, over 14533.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.0921, pruned_loss=0.01304, audio_tagging_loss=0.008767, over 3045642.05 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:59:49,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-27 16:59:57,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3151653.3333333335, ans=0.2 2023-11-27 17:00:20,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3151786.6666666665, ans=0.125 2023-11-27 17:00:22,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-27 17:00:23,107 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3850, loss[loss=0.07002, simple_loss=0.09657, pruned_loss=0.01556, audio_tagging_loss=0.006175, over 15514.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09135, pruned_loss=0.01282, audio_tagging_loss=0.008853, over 3040777.14 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:00:24,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3151853.3333333335, ans=0.125 2023-11-27 17:00:39,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3151920.0, ans=0.125 2023-11-27 17:00:46,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-27 17:00:46,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3151986.6666666665, ans=0.125 2023-11-27 17:01:02,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.917e+01 9.426e+01 9.996e+01 1.241e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 17:01:21,832 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3900, loss[loss=0.08122, simple_loss=0.1117, pruned_loss=0.01412, audio_tagging_loss=0.01127, over 15556.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09098, pruned_loss=0.01286, audio_tagging_loss=0.008845, over 3042419.55 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:01:23,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.93 vs. limit=10.0 2023-11-27 17:01:30,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3152186.6666666665, ans=0.2 2023-11-27 17:01:31,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3152253.3333333335, ans=0.09899494936611666 2023-11-27 17:01:32,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3152253.3333333335, ans=0.125 2023-11-27 17:01:44,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-27 17:01:54,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3152386.6666666665, ans=0.0 2023-11-27 17:01:58,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-11-27 17:02:10,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3152453.3333333335, ans=0.0 2023-11-27 17:02:18,846 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3950, loss[loss=0.05168, simple_loss=0.07338, pruned_loss=0.007488, audio_tagging_loss=0.007507, over 14091.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09112, pruned_loss=0.01289, audio_tagging_loss=0.008934, over 3040441.82 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:02:22,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3152520.0, ans=0.95 2023-11-27 17:02:22,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3152520.0, ans=0.5 2023-11-27 17:02:28,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-27 17:02:39,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3152586.6666666665, ans=0.125 2023-11-27 17:02:41,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-27 17:02:56,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3152720.0, ans=0.125 2023-11-27 17:02:58,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.787e+01 9.336e+01 1.021e+02 1.304e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:02:59,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3152720.0, ans=0.125 2023-11-27 17:03:05,475 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:03:16,287 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4000, loss[loss=0.07816, simple_loss=0.1005, pruned_loss=0.01646, audio_tagging_loss=0.01145, over 15853.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.092, pruned_loss=0.01303, audio_tagging_loss=0.008992, over 3049245.26 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:03:35,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3152920.0, ans=0.125 2023-11-27 17:03:39,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-27 17:03:59,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3153053.3333333335, ans=0.025 2023-11-27 17:03:59,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3153053.3333333335, ans=0.125 2023-11-27 17:04:01,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3153120.0, ans=0.125 2023-11-27 17:04:06,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3153120.0, ans=0.2 2023-11-27 17:04:09,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153120.0, ans=0.1 2023-11-27 17:04:11,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3153120.0, ans=0.2 2023-11-27 17:04:13,842 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4050, loss[loss=0.0858, simple_loss=0.1159, pruned_loss=0.01922, audio_tagging_loss=0.008617, over 15429.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09254, pruned_loss=0.01316, audio_tagging_loss=0.008965, over 3046579.93 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:04:21,528 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:04:22,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3153186.6666666665, ans=0.0 2023-11-27 17:04:37,464 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-27 17:04:41,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3153320.0, ans=0.2 2023-11-27 17:04:51,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3153386.6666666665, ans=0.07 2023-11-27 17:04:53,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 9.021e+01 9.575e+01 1.043e+02 1.402e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:04:59,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3153453.3333333335, ans=0.125 2023-11-27 17:05:05,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3153453.3333333335, ans=0.1 2023-11-27 17:05:06,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:05:12,224 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4100, loss[loss=0.06107, simple_loss=0.08485, pruned_loss=0.01098, audio_tagging_loss=0.007668, over 13581.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09238, pruned_loss=0.01301, audio_tagging_loss=0.008946, over 3041856.62 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:05:13,569 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:05:19,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-27 17:05:31,717 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:05:34,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-27 17:05:50,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3153720.0, ans=0.2 2023-11-27 17:06:08,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2023-11-27 17:06:09,969 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4150, loss[loss=0.0703, simple_loss=0.1038, pruned_loss=0.01418, audio_tagging_loss=0.004227, over 15954.00 frames. ], tot_loss[loss=0.068, simple_loss=0.0923, pruned_loss=0.01305, audio_tagging_loss=0.008807, over 3040239.20 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:06:10,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3153853.3333333335, ans=0.1 2023-11-27 17:06:19,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3153853.3333333335, ans=0.04949747468305833 2023-11-27 17:06:22,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3153920.0, ans=0.125 2023-11-27 17:06:32,967 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-27 17:06:42,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3153986.6666666665, ans=0.125 2023-11-27 17:06:48,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3154053.3333333335, ans=0.125 2023-11-27 17:06:50,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.636e+01 9.410e+01 1.013e+02 1.260e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 17:06:55,194 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:06:57,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3154120.0, ans=0.125 2023-11-27 17:06:59,803 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:06:59,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3154120.0, ans=0.125 2023-11-27 17:07:01,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-27 17:07:07,795 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4200, loss[loss=0.08311, simple_loss=0.1224, pruned_loss=0.01432, audio_tagging_loss=0.007578, over 16047.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09173, pruned_loss=0.01286, audio_tagging_loss=0.008703, over 3047827.13 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:07:09,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3154186.6666666665, ans=0.125 2023-11-27 17:07:10,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3154186.6666666665, ans=0.0 2023-11-27 17:07:19,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2023-11-27 17:07:23,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3154253.3333333335, ans=0.07 2023-11-27 17:07:25,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3154253.3333333335, ans=0.125 2023-11-27 17:07:30,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3154320.0, ans=0.125 2023-11-27 17:07:31,561 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-27 17:07:49,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3154386.6666666665, ans=0.125 2023-11-27 17:07:53,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3154453.3333333335, ans=0.2 2023-11-27 17:07:54,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3154453.3333333335, ans=0.125 2023-11-27 17:08:05,637 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4250, loss[loss=0.0531, simple_loss=0.06776, pruned_loss=0.008816, audio_tagging_loss=0.01041, over 14826.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09121, pruned_loss=0.0127, audio_tagging_loss=0.008669, over 3049608.09 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:08:06,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=15.0 2023-11-27 17:08:18,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3154586.6666666665, ans=0.1 2023-11-27 17:08:28,846 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-27 17:08:31,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3154653.3333333335, ans=0.125 2023-11-27 17:08:31,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3154653.3333333335, ans=0.125 2023-11-27 17:08:46,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.728e+01 9.330e+01 9.912e+01 1.216e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:09:04,284 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4300, loss[loss=0.0578, simple_loss=0.07963, pruned_loss=0.009338, audio_tagging_loss=0.008651, over 14721.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09197, pruned_loss=0.01273, audio_tagging_loss=0.008574, over 3045987.98 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:09:04,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-27 17:09:25,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-27 17:09:27,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-27 17:09:31,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=22.5 2023-11-27 17:09:36,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3154986.6666666665, ans=0.0 2023-11-27 17:09:38,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3155053.3333333335, ans=0.125 2023-11-27 17:09:53,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3155120.0, ans=0.125 2023-11-27 17:09:53,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-27 17:10:00,571 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4350, loss[loss=0.04172, simple_loss=0.04959, pruned_loss=0.006794, audio_tagging_loss=0.01014, over 16729.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09127, pruned_loss=0.01258, audio_tagging_loss=0.008569, over 3049689.54 frames. ], batch size: 67, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:10:00,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-11-27 17:10:08,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3155186.6666666665, ans=0.125 2023-11-27 17:10:24,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-27 17:10:31,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3155320.0, ans=0.125 2023-11-27 17:10:34,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3155386.6666666665, ans=0.125 2023-11-27 17:10:41,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.765e+01 9.493e+01 1.043e+02 1.484e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 17:10:58,643 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4400, loss[loss=0.05312, simple_loss=0.0655, pruned_loss=0.009643, audio_tagging_loss=0.01073, over 14696.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09005, pruned_loss=0.01248, audio_tagging_loss=0.008633, over 3048992.79 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:11:13,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2023-11-27 17:11:21,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-27 17:11:31,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155653.3333333335, ans=0.1 2023-11-27 17:11:34,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3155720.0, ans=0.125 2023-11-27 17:11:56,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3155853.3333333335, ans=0.0 2023-11-27 17:11:57,079 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4450, loss[loss=0.06993, simple_loss=0.09845, pruned_loss=0.01451, audio_tagging_loss=0.006194, over 16947.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09009, pruned_loss=0.01254, audio_tagging_loss=0.008665, over 3055252.69 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:12:03,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155853.3333333335, ans=0.1 2023-11-27 17:12:07,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3155920.0, ans=0.0 2023-11-27 17:12:14,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3155920.0, ans=0.1 2023-11-27 17:12:16,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2023-11-27 17:12:19,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-27 17:12:22,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3155986.6666666665, ans=0.125 2023-11-27 17:12:23,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3155986.6666666665, ans=0.125 2023-11-27 17:12:24,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3155986.6666666665, ans=0.125 2023-11-27 17:12:26,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3155986.6666666665, ans=0.125 2023-11-27 17:12:38,300 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.835e+01 9.403e+01 1.018e+02 2.786e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-27 17:12:49,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2023-11-27 17:12:53,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3156186.6666666665, ans=0.1 2023-11-27 17:12:54,384 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4500, loss[loss=0.04564, simple_loss=0.06212, pruned_loss=0.007031, audio_tagging_loss=0.007552, over 15040.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09019, pruned_loss=0.01262, audio_tagging_loss=0.008661, over 3051952.42 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:13:17,174 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-27 17:13:31,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3156386.6666666665, ans=0.95 2023-11-27 17:13:36,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3156386.6666666665, ans=0.125 2023-11-27 17:13:42,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3156453.3333333335, ans=0.1 2023-11-27 17:13:50,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3156453.3333333335, ans=0.1 2023-11-27 17:13:52,307 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4550, loss[loss=0.05119, simple_loss=0.06116, pruned_loss=0.009118, audio_tagging_loss=0.01149, over 14946.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0894, pruned_loss=0.01252, audio_tagging_loss=0.008761, over 3043873.92 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:13:53,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3156520.0, ans=0.0 2023-11-27 17:13:57,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-27 17:14:12,442 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:14:15,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-27 17:14:27,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3156720.0, ans=0.125 2023-11-27 17:14:33,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.569e+01 9.256e+01 9.932e+01 4.356e+02, threshold=1.851e+02, percent-clipped=1.0 2023-11-27 17:14:39,253 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:14:39,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3156786.6666666665, ans=0.0 2023-11-27 17:14:49,599 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4600, loss[loss=0.04566, simple_loss=0.05517, pruned_loss=0.00719, audio_tagging_loss=0.01089, over 14794.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08961, pruned_loss=0.01252, audio_tagging_loss=0.008903, over 3045691.09 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:15:12,823 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-27 17:15:15,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3156986.6666666665, ans=0.0 2023-11-27 17:15:21,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3156986.6666666665, ans=0.0 2023-11-27 17:15:47,486 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4650, loss[loss=0.07721, simple_loss=0.1008, pruned_loss=0.01704, audio_tagging_loss=0.009763, over 15072.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.0898, pruned_loss=0.01265, audio_tagging_loss=0.008971, over 3039387.85 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:16:04,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3157253.3333333335, ans=0.0 2023-11-27 17:16:08,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3157253.3333333335, ans=0.2 2023-11-27 17:16:10,318 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-27 17:16:11,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3157320.0, ans=0.125 2023-11-27 17:16:29,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 8.758e+01 9.328e+01 1.030e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:16:34,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2023-11-27 17:16:40,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-27 17:16:45,825 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4700, loss[loss=0.06884, simple_loss=0.09045, pruned_loss=0.01491, audio_tagging_loss=0.008698, over 15627.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09024, pruned_loss=0.01262, audio_tagging_loss=0.008987, over 3043003.17 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:16:58,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3157586.6666666665, ans=0.125 2023-11-27 17:16:59,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3157586.6666666665, ans=0.125 2023-11-27 17:17:08,413 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-27 17:17:26,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3157720.0, ans=0.125 2023-11-27 17:17:33,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3157786.6666666665, ans=0.2 2023-11-27 17:17:43,358 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4750, loss[loss=0.06691, simple_loss=0.0968, pruned_loss=0.01098, audio_tagging_loss=0.007526, over 14406.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09026, pruned_loss=0.01257, audio_tagging_loss=0.009006, over 3043058.12 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:17:44,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3157853.3333333335, ans=0.125 2023-11-27 17:17:52,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.36 vs. limit=10.0 2023-11-27 17:18:06,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-27 17:18:24,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.859e+01 9.575e+01 1.045e+02 1.210e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:18:39,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3158186.6666666665, ans=0.0 2023-11-27 17:18:40,222 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4800, loss[loss=0.05817, simple_loss=0.08016, pruned_loss=0.009013, audio_tagging_loss=0.009079, over 13701.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09005, pruned_loss=0.01263, audio_tagging_loss=0.009131, over 3044659.16 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:18:40,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158186.6666666665, ans=0.1 2023-11-27 17:18:43,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3158186.6666666665, ans=0.125 2023-11-27 17:18:53,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-27 17:19:03,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-27 17:19:10,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3158320.0, ans=0.0 2023-11-27 17:19:14,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-27 17:19:19,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3158386.6666666665, ans=0.0 2023-11-27 17:19:22,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3158386.6666666665, ans=0.05 2023-11-27 17:19:24,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3158386.6666666665, ans=0.0 2023-11-27 17:19:29,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2023-11-27 17:19:38,181 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4850, loss[loss=0.05964, simple_loss=0.07909, pruned_loss=0.00964, audio_tagging_loss=0.01045, over 14357.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09025, pruned_loss=0.0127, audio_tagging_loss=0.009198, over 3045236.33 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:19:56,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3158586.6666666665, ans=0.125 2023-11-27 17:19:56,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-11-27 17:20:01,594 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-27 17:20:08,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3158653.3333333335, ans=0.125 2023-11-27 17:20:12,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2023-11-27 17:20:14,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3158720.0, ans=0.125 2023-11-27 17:20:15,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158720.0, ans=0.1 2023-11-27 17:20:20,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.680e+01 9.364e+01 9.927e+01 1.620e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 17:20:26,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-27 17:20:36,673 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4900, loss[loss=0.0633, simple_loss=0.08944, pruned_loss=0.009914, audio_tagging_loss=0.008664, over 16315.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0897, pruned_loss=0.01245, audio_tagging_loss=0.009164, over 3038871.21 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:20:39,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3158853.3333333335, ans=0.0 2023-11-27 17:20:42,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158853.3333333335, ans=0.1 2023-11-27 17:20:45,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3158853.3333333335, ans=0.0 2023-11-27 17:20:48,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3158920.0, ans=0.07 2023-11-27 17:21:00,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-27 17:21:34,306 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4950, loss[loss=0.07076, simple_loss=0.09975, pruned_loss=0.0137, audio_tagging_loss=0.00719, over 16252.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0898, pruned_loss=0.01243, audio_tagging_loss=0.009008, over 3043740.89 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:21:38,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-27 17:21:41,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3159186.6666666665, ans=0.0 2023-11-27 17:21:55,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3159253.3333333335, ans=0.125 2023-11-27 17:21:57,369 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-27 17:21:58,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3159320.0, ans=0.125 2023-11-27 17:22:09,042 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:22:16,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.677e+01 9.528e+01 1.024e+02 1.553e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 17:22:25,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3159453.3333333335, ans=0.125 2023-11-27 17:22:31,925 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5000, loss[loss=0.06454, simple_loss=0.09034, pruned_loss=0.01216, audio_tagging_loss=0.007202, over 15404.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09057, pruned_loss=0.01266, audio_tagging_loss=0.008782, over 3042141.49 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:22:34,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2023-11-27 17:22:35,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3159520.0, ans=0.125 2023-11-27 17:22:37,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2023-11-27 17:22:55,044 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-27 17:23:15,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-27 17:23:22,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2023-11-27 17:23:29,486 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5050, loss[loss=0.06824, simple_loss=0.08679, pruned_loss=0.01375, audio_tagging_loss=0.0111, over 15795.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08972, pruned_loss=0.0126, audio_tagging_loss=0.008723, over 3044399.00 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:23:31,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3159853.3333333335, ans=0.125 2023-11-27 17:23:33,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3159853.3333333335, ans=0.125 2023-11-27 17:23:41,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-27 17:23:52,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-27 17:24:12,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.599e+01 9.260e+01 9.891e+01 1.238e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 17:24:16,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3160120.0, ans=0.0 2023-11-27 17:24:18,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3160120.0, ans=0.125 2023-11-27 17:24:24,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160120.0, ans=0.1 2023-11-27 17:24:27,712 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5100, loss[loss=0.08234, simple_loss=0.1142, pruned_loss=0.02094, audio_tagging_loss=0.004269, over 16365.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08922, pruned_loss=0.0126, audio_tagging_loss=0.008635, over 3037300.24 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:24:49,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3160253.3333333335, ans=0.125 2023-11-27 17:24:51,240 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-27 17:24:55,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2023-11-27 17:25:24,993 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5150, loss[loss=0.07474, simple_loss=0.0985, pruned_loss=0.01706, audio_tagging_loss=0.00843, over 14556.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09036, pruned_loss=0.01281, audio_tagging_loss=0.008615, over 3037111.47 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:25:40,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3160586.6666666665, ans=0.0 2023-11-27 17:25:48,583 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-27 17:26:07,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.651e+01 9.333e+01 9.963e+01 1.109e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:26:08,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3160720.0, ans=0.125 2023-11-27 17:26:11,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3160786.6666666665, ans=0.0 2023-11-27 17:26:22,467 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5200, loss[loss=0.09358, simple_loss=0.1368, pruned_loss=0.02083, audio_tagging_loss=0.004357, over 15454.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09141, pruned_loss=0.01305, audio_tagging_loss=0.008568, over 3033259.59 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:26:28,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3160853.3333333335, ans=0.125 2023-11-27 17:26:42,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=10.0 2023-11-27 17:26:45,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-27 17:26:56,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-27 17:27:09,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3161120.0, ans=0.0 2023-11-27 17:27:16,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-11-27 17:27:17,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3161120.0, ans=0.0 2023-11-27 17:27:20,053 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5250, loss[loss=0.06001, simple_loss=0.08037, pruned_loss=0.007629, audio_tagging_loss=0.0122, over 14504.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09215, pruned_loss=0.01314, audio_tagging_loss=0.008553, over 3031097.62 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:27:33,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3161253.3333333335, ans=0.2 2023-11-27 17:27:41,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3161320.0, ans=0.0 2023-11-27 17:27:42,542 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-27 17:27:54,042 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.579e-03 2023-11-27 17:28:03,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 8.718e+01 9.401e+01 1.041e+02 1.435e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 17:28:04,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-27 17:28:05,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3161453.3333333335, ans=0.125 2023-11-27 17:28:07,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3161453.3333333335, ans=0.0 2023-11-27 17:28:11,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3161453.3333333335, ans=0.0 2023-11-27 17:28:16,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-27 17:28:17,163 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5300, loss[loss=0.07305, simple_loss=0.1024, pruned_loss=0.01613, audio_tagging_loss=0.00575, over 15160.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09174, pruned_loss=0.01304, audio_tagging_loss=0.008585, over 3035492.66 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:28:25,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3161520.0, ans=0.2 2023-11-27 17:28:40,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-27 17:28:50,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3161720.0, ans=0.125 2023-11-27 17:28:52,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161720.0, ans=0.1 2023-11-27 17:28:56,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161720.0, ans=0.1 2023-11-27 17:28:57,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3161720.0, ans=0.125 2023-11-27 17:29:00,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3161720.0, ans=0.025 2023-11-27 17:29:03,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3161786.6666666665, ans=0.0 2023-11-27 17:29:14,709 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5350, loss[loss=0.08957, simple_loss=0.1336, pruned_loss=0.01676, audio_tagging_loss=0.005986, over 15473.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09215, pruned_loss=0.013, audio_tagging_loss=0.008562, over 3034613.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:29:23,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3161853.3333333335, ans=0.0 2023-11-27 17:29:37,999 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-27 17:29:57,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.549e+01 9.139e+01 9.970e+01 1.797e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 17:30:13,035 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5400, loss[loss=0.06693, simple_loss=0.081, pruned_loss=0.01526, audio_tagging_loss=0.01117, over 15467.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09131, pruned_loss=0.01272, audio_tagging_loss=0.008652, over 3032813.92 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:30:13,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3162186.6666666665, ans=0.2 2023-11-27 17:30:35,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-27 17:31:09,435 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5450, loss[loss=0.07132, simple_loss=0.09469, pruned_loss=0.01743, audio_tagging_loss=0.00654, over 15241.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09062, pruned_loss=0.01281, audio_tagging_loss=0.008757, over 3028893.07 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:31:33,084 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-27 17:31:53,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.734e+01 9.322e+01 1.014e+02 1.420e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 17:32:05,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3162786.6666666665, ans=0.2 2023-11-27 17:32:07,529 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5500, loss[loss=0.05527, simple_loss=0.07447, pruned_loss=0.007868, audio_tagging_loss=0.01016, over 14678.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09065, pruned_loss=0.01277, audio_tagging_loss=0.008858, over 3026597.85 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:32:10,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3162853.3333333335, ans=0.125 2023-11-27 17:32:14,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-11-27 17:32:30,730 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-27 17:32:37,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2023-11-27 17:32:47,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163053.3333333335, ans=0.1 2023-11-27 17:32:53,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3163120.0, ans=0.125 2023-11-27 17:33:05,365 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5550, loss[loss=0.06407, simple_loss=0.08372, pruned_loss=0.01297, audio_tagging_loss=0.009243, over 16179.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09106, pruned_loss=0.01275, audio_tagging_loss=0.00889, over 3031069.88 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:33:14,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3163186.6666666665, ans=0.0 2023-11-27 17:33:17,632 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:33:27,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-27 17:33:49,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.657e+01 9.312e+01 9.840e+01 1.170e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 17:34:01,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3163520.0, ans=0.125 2023-11-27 17:34:02,471 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5600, loss[loss=0.05988, simple_loss=0.0757, pruned_loss=0.01199, audio_tagging_loss=0.01004, over 15172.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09174, pruned_loss=0.01295, audio_tagging_loss=0.008916, over 3036180.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:34:14,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-27 17:34:25,492 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-27 17:34:47,597 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:34:51,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3163786.6666666665, ans=0.125 2023-11-27 17:34:59,979 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5650, loss[loss=0.05416, simple_loss=0.07147, pruned_loss=0.009668, audio_tagging_loss=0.008753, over 14496.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.0909, pruned_loss=0.01277, audio_tagging_loss=0.009001, over 3040538.14 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:35:11,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3163920.0, ans=0.125 2023-11-27 17:35:23,696 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-27 17:35:34,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3164053.3333333335, ans=0.125 2023-11-27 17:35:45,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.679e+01 9.217e+01 1.003e+02 1.541e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 17:35:48,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3164120.0, ans=0.125 2023-11-27 17:35:49,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-27 17:35:58,296 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5700, loss[loss=0.0579, simple_loss=0.08116, pruned_loss=0.01044, audio_tagging_loss=0.006885, over 15399.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09172, pruned_loss=0.01297, audio_tagging_loss=0.008884, over 3044997.94 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:36:05,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3164186.6666666665, ans=0.125 2023-11-27 17:36:10,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2023-11-27 17:36:20,619 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-27 17:36:33,641 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:36:44,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3164453.3333333335, ans=0.1 2023-11-27 17:36:44,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-27 17:36:55,057 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5750, loss[loss=0.05463, simple_loss=0.06062, pruned_loss=0.01408, audio_tagging_loss=0.01024, over 14720.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09095, pruned_loss=0.01274, audio_tagging_loss=0.008766, over 3046570.51 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:36:55,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3164520.0, ans=0.125 2023-11-27 17:36:58,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3164520.0, ans=0.0 2023-11-27 17:37:04,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3164520.0, ans=0.0 2023-11-27 17:37:18,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-27 17:37:35,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3164720.0, ans=0.125 2023-11-27 17:37:39,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.634e+01 9.303e+01 1.008e+02 1.326e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:37:45,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3164786.6666666665, ans=0.125 2023-11-27 17:37:52,598 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5800, loss[loss=0.06916, simple_loss=0.09278, pruned_loss=0.01424, audio_tagging_loss=0.008531, over 16227.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09133, pruned_loss=0.0128, audio_tagging_loss=0.008748, over 3047980.89 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:37:52,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3164853.3333333335, ans=0.0 2023-11-27 17:37:52,915 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:38:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3164986.6666666665, ans=0.07 2023-11-27 17:38:15,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-27 17:38:16,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3164986.6666666665, ans=0.0 2023-11-27 17:38:46,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3165120.0, ans=0.95 2023-11-27 17:38:49,885 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5850, loss[loss=0.05807, simple_loss=0.08133, pruned_loss=0.009446, audio_tagging_loss=0.007953, over 14601.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09167, pruned_loss=0.01284, audio_tagging_loss=0.008652, over 3047317.02 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:39:13,005 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-27 17:39:19,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3165320.0, ans=0.125 2023-11-27 17:39:20,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3165320.0, ans=0.125 2023-11-27 17:39:26,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3165386.6666666665, ans=0.125 2023-11-27 17:39:30,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-27 17:39:35,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.698e+01 9.361e+01 9.946e+01 1.172e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 17:39:44,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3165453.3333333335, ans=0.125 2023-11-27 17:39:46,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3165520.0, ans=0.0 2023-11-27 17:39:48,451 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5900, loss[loss=0.07168, simple_loss=0.09278, pruned_loss=0.01516, audio_tagging_loss=0.01014, over 15759.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09107, pruned_loss=0.01274, audio_tagging_loss=0.008656, over 3054277.96 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:05,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3165586.6666666665, ans=0.09899494936611666 2023-11-27 17:40:11,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-27 17:40:46,222 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5950, loss[loss=0.06005, simple_loss=0.07723, pruned_loss=0.01064, audio_tagging_loss=0.01079, over 15178.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09183, pruned_loss=0.01295, audio_tagging_loss=0.008571, over 3057803.96 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:56,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3165920.0, ans=0.125 2023-11-27 17:41:04,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3165920.0, ans=0.125 2023-11-27 17:41:07,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3165920.0, ans=0.1 2023-11-27 17:41:07,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3165920.0, ans=0.0 2023-11-27 17:41:09,172 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-27 17:41:27,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3166053.3333333335, ans=0.125 2023-11-27 17:41:30,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.669e+01 9.306e+01 1.020e+02 1.374e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:41:33,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3166120.0, ans=0.0 2023-11-27 17:41:38,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3166120.0, ans=0.0 2023-11-27 17:41:43,477 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6000, loss[loss=0.05023, simple_loss=0.06659, pruned_loss=0.008794, audio_tagging_loss=0.008144, over 16272.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09192, pruned_loss=0.01296, audio_tagging_loss=0.008632, over 3061394.75 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:41:43,484 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 17:42:18,059 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05751, simple_loss=0.05064, pruned_loss=0.005151, audio_tagging_loss=0.02703, over 4681554.00 frames. 2023-11-27 17:42:18,060 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 17:42:32,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-27 17:42:36,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3166253.3333333335, ans=0.125 2023-11-27 17:42:40,798 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-27 17:42:44,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3166320.0, ans=0.0 2023-11-27 17:42:57,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3166386.6666666665, ans=0.95 2023-11-27 17:43:02,560 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:43:09,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-27 17:43:09,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2023-11-27 17:43:14,954 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6050, loss[loss=0.0926, simple_loss=0.1207, pruned_loss=0.02413, audio_tagging_loss=0.008104, over 15149.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09175, pruned_loss=0.01281, audio_tagging_loss=0.008573, over 3061265.11 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:43:26,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166586.6666666665, ans=0.1 2023-11-27 17:43:37,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3166653.3333333335, ans=0.125 2023-11-27 17:43:38,173 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-27 17:43:41,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3166653.3333333335, ans=0.125 2023-11-27 17:43:48,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3166720.0, ans=0.0 2023-11-27 17:43:49,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-27 17:44:01,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.500e+01 9.097e+01 9.950e+01 1.272e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-27 17:44:03,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-27 17:44:12,682 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6100, loss[loss=0.08662, simple_loss=0.1178, pruned_loss=0.01872, audio_tagging_loss=0.00899, over 15906.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09133, pruned_loss=0.01272, audio_tagging_loss=0.008594, over 3060299.99 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:44:15,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-27 17:44:24,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3166920.0, ans=0.1 2023-11-27 17:44:25,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3166920.0, ans=0.0 2023-11-27 17:44:26,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3166920.0, ans=0.0 2023-11-27 17:44:35,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-27 17:44:39,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3166986.6666666665, ans=0.035 2023-11-27 17:44:46,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3167053.3333333335, ans=0.125 2023-11-27 17:45:09,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3167186.6666666665, ans=0.04949747468305833 2023-11-27 17:45:10,565 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6150, loss[loss=0.08957, simple_loss=0.1234, pruned_loss=0.02054, audio_tagging_loss=0.007349, over 15466.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09139, pruned_loss=0.01287, audio_tagging_loss=0.008623, over 3061725.38 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:45:10,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3167186.6666666665, ans=0.0 2023-11-27 17:45:24,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=12.0 2023-11-27 17:45:34,349 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-27 17:45:35,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3167320.0, ans=0.125 2023-11-27 17:45:54,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3167386.6666666665, ans=0.125 2023-11-27 17:45:56,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.777e+01 9.490e+01 1.001e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 17:45:57,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2023-11-27 17:46:00,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3167453.3333333335, ans=0.125 2023-11-27 17:46:04,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3167453.3333333335, ans=0.2 2023-11-27 17:46:08,777 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6200, loss[loss=0.07715, simple_loss=0.1149, pruned_loss=0.01009, audio_tagging_loss=0.009613, over 15505.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09067, pruned_loss=0.01265, audio_tagging_loss=0.008733, over 3054501.65 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:46:10,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2023-11-27 17:46:24,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3167586.6666666665, ans=0.0 2023-11-27 17:46:31,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-27 17:46:53,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3167786.6666666665, ans=0.0 2023-11-27 17:47:05,734 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6250, loss[loss=0.06497, simple_loss=0.07792, pruned_loss=0.01393, audio_tagging_loss=0.01208, over 15409.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09007, pruned_loss=0.01256, audio_tagging_loss=0.008799, over 3056394.34 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:47:14,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3167853.3333333335, ans=0.0 2023-11-27 17:47:28,406 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-27 17:47:41,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3168053.3333333335, ans=0.5 2023-11-27 17:47:51,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-27 17:47:52,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.766e+01 9.317e+01 1.001e+02 1.294e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 17:48:03,985 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6300, loss[loss=0.05641, simple_loss=0.07579, pruned_loss=0.008516, audio_tagging_loss=0.009995, over 15871.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09041, pruned_loss=0.0126, audio_tagging_loss=0.008935, over 3053204.67 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:48:27,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-27 17:48:35,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3168320.0, ans=0.125 2023-11-27 17:48:42,813 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:48:54,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3168453.3333333335, ans=0.1 2023-11-27 17:48:57,100 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:49:01,772 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6350, loss[loss=0.08676, simple_loss=0.1183, pruned_loss=0.02053, audio_tagging_loss=0.007054, over 14016.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.0911, pruned_loss=0.01282, audio_tagging_loss=0.008919, over 3044097.37 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:49:05,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3168520.0, ans=0.125 2023-11-27 17:49:06,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3168520.0, ans=0.02 2023-11-27 17:49:18,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3168586.6666666665, ans=0.125 2023-11-27 17:49:21,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3168586.6666666665, ans=0.125 2023-11-27 17:49:25,267 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-27 17:49:42,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3168720.0, ans=0.125 2023-11-27 17:49:47,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 8.715e+01 9.486e+01 1.017e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 17:49:49,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3168786.6666666665, ans=0.125 2023-11-27 17:50:00,008 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6400, loss[loss=0.09549, simple_loss=0.1373, pruned_loss=0.01793, audio_tagging_loss=0.008912, over 15613.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09065, pruned_loss=0.01273, audio_tagging_loss=0.009041, over 3048178.91 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:50:04,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=22.5 2023-11-27 17:50:19,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3168920.0, ans=0.125 2023-11-27 17:50:22,398 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-27 17:50:26,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-11-27 17:50:41,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3169053.3333333335, ans=0.0 2023-11-27 17:50:41,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.64 vs. limit=10.0 2023-11-27 17:50:57,148 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6450, loss[loss=0.08074, simple_loss=0.1165, pruned_loss=0.01471, audio_tagging_loss=0.007792, over 15912.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09053, pruned_loss=0.01269, audio_tagging_loss=0.009087, over 3048701.98 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:51:01,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3169186.6666666665, ans=0.2 2023-11-27 17:51:12,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3169253.3333333335, ans=0.1 2023-11-27 17:51:15,519 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:51:20,169 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-27 17:51:41,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3169386.6666666665, ans=0.2 2023-11-27 17:51:44,568 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.833e+01 9.242e+01 1.006e+02 1.317e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 17:51:48,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-27 17:51:53,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3169520.0, ans=0.125 2023-11-27 17:51:54,621 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6500, loss[loss=0.06397, simple_loss=0.09224, pruned_loss=0.01034, audio_tagging_loss=0.007505, over 15422.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09069, pruned_loss=0.01281, audio_tagging_loss=0.009108, over 3048212.65 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:52:17,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3169653.3333333335, ans=0.125 2023-11-27 17:52:18,396 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-27 17:52:23,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3169653.3333333335, ans=0.125 2023-11-27 17:52:26,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3169653.3333333335, ans=0.2 2023-11-27 17:52:29,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3169720.0, ans=0.1 2023-11-27 17:52:37,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2023-11-27 17:52:40,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3169786.6666666665, ans=0.125 2023-11-27 17:52:42,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3169786.6666666665, ans=0.0 2023-11-27 17:52:45,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-11-27 17:52:50,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3169786.6666666665, ans=0.125 2023-11-27 17:52:53,636 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6550, loss[loss=0.08494, simple_loss=0.1231, pruned_loss=0.01609, audio_tagging_loss=0.007284, over 15573.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09078, pruned_loss=0.01295, audio_tagging_loss=0.00892, over 3052540.80 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:52:53,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3169853.3333333335, ans=0.95 2023-11-27 17:52:59,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2023-11-27 17:53:08,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3169920.0, ans=0.125 2023-11-27 17:53:16,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-27 17:53:40,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.596e+01 9.247e+01 9.962e+01 1.603e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 17:53:48,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3170120.0, ans=0.125 2023-11-27 17:53:51,281 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6600, loss[loss=0.08426, simple_loss=0.1112, pruned_loss=0.02058, audio_tagging_loss=0.008078, over 15625.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08949, pruned_loss=0.01265, audio_tagging_loss=0.008879, over 3051330.17 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:54:12,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3170320.0, ans=0.0 2023-11-27 17:54:13,805 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-27 17:54:24,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3170386.6666666665, ans=0.0 2023-11-27 17:54:33,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3170386.6666666665, ans=0.125 2023-11-27 17:54:37,711 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:54:48,461 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6650, loss[loss=0.05449, simple_loss=0.07573, pruned_loss=0.01093, audio_tagging_loss=0.005689, over 15550.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08885, pruned_loss=0.01253, audio_tagging_loss=0.008903, over 3050842.97 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:54:56,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-11-27 17:55:11,990 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-27 17:55:26,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-11-27 17:55:36,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.804e+01 9.430e+01 1.026e+02 1.343e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 17:55:39,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3170786.6666666665, ans=0.125 2023-11-27 17:55:46,568 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6700, loss[loss=0.06997, simple_loss=0.09894, pruned_loss=0.01235, audio_tagging_loss=0.008145, over 15759.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08865, pruned_loss=0.01239, audio_tagging_loss=0.008869, over 3051542.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:55:46,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3170853.3333333335, ans=0.125 2023-11-27 17:55:59,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3170920.0, ans=10.0 2023-11-27 17:56:09,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-27 17:56:11,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3170986.6666666665, ans=0.0 2023-11-27 17:56:17,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3170986.6666666665, ans=0.125 2023-11-27 17:56:43,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3171186.6666666665, ans=0.0 2023-11-27 17:56:44,908 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6750, loss[loss=0.04739, simple_loss=0.06829, pruned_loss=0.006528, audio_tagging_loss=0.00672, over 14455.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08918, pruned_loss=0.01248, audio_tagging_loss=0.008701, over 3048689.26 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:57:00,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-27 17:57:07,558 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-27 17:57:16,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3171320.0, ans=0.125 2023-11-27 17:57:32,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.726e+01 9.253e+01 9.869e+01 1.204e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 17:57:42,253 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6800, loss[loss=0.0594, simple_loss=0.08371, pruned_loss=0.01102, audio_tagging_loss=0.006526, over 14438.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08947, pruned_loss=0.0125, audio_tagging_loss=0.008654, over 3059260.21 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:57:59,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-27 17:58:04,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3171653.3333333335, ans=0.125 2023-11-27 17:58:05,067 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-27 17:58:05,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3171653.3333333335, ans=0.0 2023-11-27 17:58:09,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3171653.3333333335, ans=0.125 2023-11-27 17:58:10,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-27 17:58:40,106 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6850, loss[loss=0.05069, simple_loss=0.05549, pruned_loss=0.01189, audio_tagging_loss=0.01106, over 15023.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08963, pruned_loss=0.01247, audio_tagging_loss=0.008651, over 3053855.59 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:58:55,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=22.5 2023-11-27 17:58:59,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3171920.0, ans=0.125 2023-11-27 17:59:02,465 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:59:03,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-27 17:59:10,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3171986.6666666665, ans=0.125 2023-11-27 17:59:14,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3172053.3333333335, ans=0.2 2023-11-27 17:59:28,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.901e+01 9.541e+01 1.005e+02 1.351e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 17:59:32,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-27 17:59:38,196 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6900, loss[loss=0.07221, simple_loss=0.1021, pruned_loss=0.01326, audio_tagging_loss=0.007921, over 16186.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09061, pruned_loss=0.01258, audio_tagging_loss=0.008599, over 3057638.47 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:59:40,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3172186.6666666665, ans=0.0 2023-11-27 17:59:46,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3172186.6666666665, ans=0.1 2023-11-27 17:59:50,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3172253.3333333335, ans=0.125 2023-11-27 18:00:01,159 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-27 18:00:03,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3172320.0, ans=0.125 2023-11-27 18:00:11,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-27 18:00:25,632 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:00:25,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3172453.3333333335, ans=0.125 2023-11-27 18:00:36,169 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6950, loss[loss=0.06826, simple_loss=0.09476, pruned_loss=0.01455, audio_tagging_loss=0.006331, over 15886.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09115, pruned_loss=0.01274, audio_tagging_loss=0.008669, over 3055169.34 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:00:38,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3172520.0, ans=0.125 2023-11-27 18:00:59,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-27 18:01:00,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172653.3333333335, ans=0.1 2023-11-27 18:01:04,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3172653.3333333335, ans=0.0 2023-11-27 18:01:18,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3172720.0, ans=0.0 2023-11-27 18:01:24,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.751e+01 9.118e+01 9.607e+01 1.229e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 18:01:24,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3172786.6666666665, ans=0.125 2023-11-27 18:01:33,720 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7000, loss[loss=0.0658, simple_loss=0.09042, pruned_loss=0.01191, audio_tagging_loss=0.008678, over 14363.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09056, pruned_loss=0.0127, audio_tagging_loss=0.008688, over 3053213.04 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:01:38,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3172853.3333333335, ans=0.5 2023-11-27 18:01:41,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3172853.3333333335, ans=0.2 2023-11-27 18:01:48,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-27 18:01:53,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3172920.0, ans=0.1 2023-11-27 18:01:56,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-27 18:01:57,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3172986.6666666665, ans=0.0 2023-11-27 18:01:57,929 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:01:58,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3172986.6666666665, ans=0.125 2023-11-27 18:01:59,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2023-11-27 18:02:26,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-27 18:02:28,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3173120.0, ans=0.125 2023-11-27 18:02:28,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3173120.0, ans=0.125 2023-11-27 18:02:30,896 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7050, loss[loss=0.06641, simple_loss=0.09416, pruned_loss=0.01051, audio_tagging_loss=0.008822, over 15770.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09046, pruned_loss=0.01263, audio_tagging_loss=0.008766, over 3050273.32 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:02:50,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2023-11-27 18:02:54,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-27 18:03:21,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.705e+01 9.232e+01 9.917e+01 1.412e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 18:03:31,261 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7100, loss[loss=0.06558, simple_loss=0.08684, pruned_loss=0.01272, audio_tagging_loss=0.009438, over 15470.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09074, pruned_loss=0.01274, audio_tagging_loss=0.008794, over 3046525.95 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:03:49,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3173586.6666666665, ans=0.025 2023-11-27 18:03:54,140 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-27 18:04:10,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-27 18:04:14,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3173720.0, ans=0.125 2023-11-27 18:04:19,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3173786.6666666665, ans=0.125 2023-11-27 18:04:21,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2023-11-27 18:04:28,663 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7150, loss[loss=0.05863, simple_loss=0.07971, pruned_loss=0.01138, audio_tagging_loss=0.007395, over 15564.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09103, pruned_loss=0.01285, audio_tagging_loss=0.008794, over 3044061.18 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:04:42,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-27 18:04:43,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2023-11-27 18:04:51,747 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-27 18:04:51,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3173986.6666666665, ans=0.0 2023-11-27 18:05:01,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-27 18:05:12,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3174053.3333333335, ans=0.0 2023-11-27 18:05:17,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.847e+01 9.283e+01 1.002e+02 1.688e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:05:22,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3174120.0, ans=0.0 2023-11-27 18:05:25,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-11-27 18:05:25,997 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7200, loss[loss=0.07312, simple_loss=0.1009, pruned_loss=0.01447, audio_tagging_loss=0.00822, over 15442.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09075, pruned_loss=0.01279, audio_tagging_loss=0.008884, over 3041657.66 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:05:26,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-27 18:05:35,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3174186.6666666665, ans=0.2 2023-11-27 18:05:43,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3174253.3333333335, ans=0.125 2023-11-27 18:05:46,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3174253.3333333335, ans=0.0 2023-11-27 18:05:49,086 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-27 18:05:55,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3174320.0, ans=0.125 2023-11-27 18:06:05,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3174386.6666666665, ans=0.125 2023-11-27 18:06:23,129 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7250, loss[loss=0.06451, simple_loss=0.09277, pruned_loss=0.01017, audio_tagging_loss=0.007955, over 14770.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09006, pruned_loss=0.01263, audio_tagging_loss=0.008905, over 3036690.42 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:06:33,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3174586.6666666665, ans=0.2 2023-11-27 18:06:35,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3174586.6666666665, ans=0.0 2023-11-27 18:06:35,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3174586.6666666665, ans=0.1 2023-11-27 18:06:46,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-27 18:06:58,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3174720.0, ans=0.125 2023-11-27 18:07:09,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3174786.6666666665, ans=0.125 2023-11-27 18:07:11,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.649e+01 9.273e+01 9.853e+01 1.291e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:07:21,668 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7300, loss[loss=0.08723, simple_loss=0.1151, pruned_loss=0.02273, audio_tagging_loss=0.006963, over 15367.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09097, pruned_loss=0.01273, audio_tagging_loss=0.008805, over 3046285.69 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:07:26,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3174853.3333333335, ans=0.0 2023-11-27 18:07:27,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3174853.3333333335, ans=0.125 2023-11-27 18:07:35,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3174920.0, ans=0.125 2023-11-27 18:07:44,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-27 18:07:50,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3174986.6666666665, ans=0.2 2023-11-27 18:08:09,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2023-11-27 18:08:09,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-27 18:08:16,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2023-11-27 18:08:19,042 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7350, loss[loss=0.06711, simple_loss=0.09435, pruned_loss=0.01265, audio_tagging_loss=0.007289, over 16572.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09108, pruned_loss=0.01267, audio_tagging_loss=0.008594, over 3049332.88 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:08:30,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-27 18:08:40,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3175320.0, ans=15.0 2023-11-27 18:08:41,487 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-27 18:08:43,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3175320.0, ans=0.125 2023-11-27 18:08:46,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3175320.0, ans=0.0 2023-11-27 18:08:59,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3175386.6666666665, ans=0.0 2023-11-27 18:09:08,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.708e+01 9.249e+01 1.003e+02 1.493e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 18:09:15,684 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7400, loss[loss=0.08285, simple_loss=0.1157, pruned_loss=0.01832, audio_tagging_loss=0.006661, over 14485.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.0921, pruned_loss=0.01297, audio_tagging_loss=0.008427, over 3050581.21 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:09:36,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3175586.6666666665, ans=0.0 2023-11-27 18:09:39,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-27 18:09:41,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3175653.3333333335, ans=0.0 2023-11-27 18:09:42,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3175653.3333333335, ans=0.0 2023-11-27 18:09:56,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3175720.0, ans=0.125 2023-11-27 18:09:56,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3175720.0, ans=0.0 2023-11-27 18:10:04,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3175786.6666666665, ans=0.125 2023-11-27 18:10:12,902 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7450, loss[loss=0.05598, simple_loss=0.08285, pruned_loss=0.006116, audio_tagging_loss=0.008438, over 15873.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09081, pruned_loss=0.01283, audio_tagging_loss=0.008509, over 3054478.02 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:10:14,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3175853.3333333335, ans=0.07 2023-11-27 18:10:14,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-27 18:10:15,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2023-11-27 18:10:23,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3175920.0, ans=0.2 2023-11-27 18:10:33,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175920.0, ans=0.1 2023-11-27 18:10:36,496 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-27 18:10:51,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3176053.3333333335, ans=0.0 2023-11-27 18:10:53,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-27 18:10:57,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3176053.3333333335, ans=0.1 2023-11-27 18:11:02,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.662e+01 9.277e+01 9.892e+01 1.175e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:11:04,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-27 18:11:05,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3176120.0, ans=0.0 2023-11-27 18:11:11,062 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7500, loss[loss=0.06125, simple_loss=0.07166, pruned_loss=0.01479, audio_tagging_loss=0.01063, over 15197.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09062, pruned_loss=0.01287, audio_tagging_loss=0.008541, over 3054519.40 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:11:33,531 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-27 18:11:41,844 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:11:47,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3176386.6666666665, ans=0.125 2023-11-27 18:11:59,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3176453.3333333335, ans=0.0 2023-11-27 18:12:00,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-11-27 18:12:08,306 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7550, loss[loss=0.06451, simple_loss=0.08326, pruned_loss=0.01301, audio_tagging_loss=0.009867, over 15627.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09038, pruned_loss=0.01277, audio_tagging_loss=0.008505, over 3059958.80 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:12:15,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3176520.0, ans=10.0 2023-11-27 18:12:19,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3176586.6666666665, ans=0.125 2023-11-27 18:12:31,228 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-27 18:12:39,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-27 18:12:43,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3176720.0, ans=10.0 2023-11-27 18:12:55,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-27 18:12:57,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.727e+01 9.587e+01 1.045e+02 1.317e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 18:13:05,311 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7600, loss[loss=0.07513, simple_loss=0.1044, pruned_loss=0.01418, audio_tagging_loss=0.008734, over 15588.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09014, pruned_loss=0.01279, audio_tagging_loss=0.008546, over 3054483.35 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:13:09,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3176853.3333333335, ans=0.125 2023-11-27 18:13:28,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-27 18:13:35,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-27 18:13:55,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3177120.0, ans=0.2 2023-11-27 18:14:02,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3177186.6666666665, ans=0.125 2023-11-27 18:14:03,562 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7650, loss[loss=0.04718, simple_loss=0.06933, pruned_loss=0.004369, audio_tagging_loss=0.008151, over 14162.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09003, pruned_loss=0.01257, audio_tagging_loss=0.008548, over 3049519.93 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:14:26,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-27 18:14:52,990 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.590e+01 9.167e+01 9.909e+01 1.245e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 18:15:01,113 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7700, loss[loss=0.06412, simple_loss=0.07887, pruned_loss=0.01329, audio_tagging_loss=0.01139, over 17042.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09015, pruned_loss=0.01262, audio_tagging_loss=0.00863, over 3049121.39 frames. ], batch size: 65, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:15:12,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3177586.6666666665, ans=0.0 2023-11-27 18:15:13,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3177586.6666666665, ans=0.0 2023-11-27 18:15:23,664 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-27 18:15:24,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3177653.3333333335, ans=0.0 2023-11-27 18:15:26,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3177653.3333333335, ans=0.125 2023-11-27 18:15:28,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3177653.3333333335, ans=0.0 2023-11-27 18:15:30,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-27 18:15:31,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-11-27 18:15:57,818 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7750, loss[loss=0.08067, simple_loss=0.09591, pruned_loss=0.02378, audio_tagging_loss=0.00894, over 15360.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08947, pruned_loss=0.01274, audio_tagging_loss=0.008735, over 3048979.41 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:16:21,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-27 18:16:46,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-27 18:16:48,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.657e+01 9.352e+01 9.918e+01 1.309e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 18:16:54,647 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7800, loss[loss=0.07235, simple_loss=0.09438, pruned_loss=0.01412, audio_tagging_loss=0.01105, over 14526.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.0905, pruned_loss=0.01278, audio_tagging_loss=0.008813, over 3050483.39 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:17:02,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3178186.6666666665, ans=0.125 2023-11-27 18:17:14,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2023-11-27 18:17:15,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3178253.3333333335, ans=0.125 2023-11-27 18:17:18,155 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-27 18:17:20,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3178320.0, ans=0.125 2023-11-27 18:17:32,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-27 18:17:33,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3178386.6666666665, ans=0.025 2023-11-27 18:17:43,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3178453.3333333335, ans=0.035 2023-11-27 18:17:50,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3178453.3333333335, ans=0.125 2023-11-27 18:17:51,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2023-11-27 18:17:53,014 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7850, loss[loss=0.05576, simple_loss=0.06921, pruned_loss=0.008955, audio_tagging_loss=0.01221, over 15209.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08994, pruned_loss=0.01266, audio_tagging_loss=0.008955, over 3036957.76 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:18:03,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3178586.6666666665, ans=0.0 2023-11-27 18:18:05,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-27 18:18:07,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3178586.6666666665, ans=0.0 2023-11-27 18:18:10,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3178586.6666666665, ans=0.1 2023-11-27 18:18:15,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-27 18:18:29,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3178720.0, ans=0.125 2023-11-27 18:18:40,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3178786.6666666665, ans=0.0 2023-11-27 18:18:44,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.789e+01 9.389e+01 9.930e+01 1.229e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 18:18:50,074 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7900, loss[loss=0.09062, simple_loss=0.1166, pruned_loss=0.02508, audio_tagging_loss=0.007244, over 14743.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08983, pruned_loss=0.01267, audio_tagging_loss=0.009029, over 3038947.30 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:18:51,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3178853.3333333335, ans=0.07 2023-11-27 18:18:54,741 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:19:13,029 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-27 18:19:16,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-27 18:19:24,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179053.3333333335, ans=0.1 2023-11-27 18:19:47,738 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7950, loss[loss=0.09861, simple_loss=0.1216, pruned_loss=0.02541, audio_tagging_loss=0.01237, over 15667.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.08992, pruned_loss=0.01263, audio_tagging_loss=0.009154, over 3041104.31 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:19:59,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3179253.3333333335, ans=0.2 2023-11-27 18:20:05,966 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:20:10,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3179320.0, ans=0.125 2023-11-27 18:20:11,366 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-27 18:20:28,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179386.6666666665, ans=0.1 2023-11-27 18:20:30,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3179386.6666666665, ans=0.0 2023-11-27 18:20:32,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2023-11-27 18:20:37,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2023-11-27 18:20:39,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.819e+01 9.346e+01 1.021e+02 1.484e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-27 18:20:44,943 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8000, loss[loss=0.0454, simple_loss=0.06461, pruned_loss=0.004903, audio_tagging_loss=0.008188, over 14548.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08966, pruned_loss=0.01245, audio_tagging_loss=0.009244, over 3042198.34 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:20:50,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3179520.0, ans=0.125 2023-11-27 18:20:52,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179520.0, ans=0.1 2023-11-27 18:21:07,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3179653.3333333335, ans=0.0 2023-11-27 18:21:08,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-27 18:21:16,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3179653.3333333335, ans=0.2 2023-11-27 18:21:22,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3179720.0, ans=0.0 2023-11-27 18:21:22,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3179720.0, ans=0.125 2023-11-27 18:21:39,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3179786.6666666665, ans=0.125 2023-11-27 18:21:42,472 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8050, loss[loss=0.08069, simple_loss=0.1173, pruned_loss=0.0144, audio_tagging_loss=0.007644, over 15712.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08794, pruned_loss=0.01212, audio_tagging_loss=0.009301, over 3036721.70 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:21:50,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3179853.3333333335, ans=0.125 2023-11-27 18:22:01,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3179920.0, ans=0.0 2023-11-27 18:22:01,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3179920.0, ans=0.125 2023-11-27 18:22:05,354 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-27 18:22:09,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3179986.6666666665, ans=0.2 2023-11-27 18:22:12,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3179986.6666666665, ans=0.125 2023-11-27 18:22:15,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-27 18:22:20,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3180053.3333333335, ans=0.125 2023-11-27 18:22:29,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3180120.0, ans=0.1 2023-11-27 18:22:35,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.539e+01 9.239e+01 9.821e+01 1.214e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 18:22:36,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3180120.0, ans=0.0 2023-11-27 18:22:39,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3180186.6666666665, ans=0.2 2023-11-27 18:22:39,954 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8100, loss[loss=0.07144, simple_loss=0.1149, pruned_loss=0.0102, audio_tagging_loss=0.003792, over 14857.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08783, pruned_loss=0.01212, audio_tagging_loss=0.009153, over 3023452.50 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:22:53,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3180253.3333333335, ans=0.0 2023-11-27 18:22:59,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2023-11-27 18:23:03,609 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-27 18:23:33,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.92 vs. limit=10.0 2023-11-27 18:23:36,913 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8150, loss[loss=0.06566, simple_loss=0.09596, pruned_loss=0.01004, audio_tagging_loss=0.007641, over 15418.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.0877, pruned_loss=0.01212, audio_tagging_loss=0.008994, over 3029076.17 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:23:39,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2023-11-27 18:23:43,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3180520.0, ans=0.125 2023-11-27 18:24:00,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-27 18:24:29,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.482e+01 9.156e+01 9.778e+01 1.274e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 18:24:34,779 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8200, loss[loss=0.05188, simple_loss=0.06658, pruned_loss=0.01016, audio_tagging_loss=0.008437, over 15472.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08807, pruned_loss=0.01226, audio_tagging_loss=0.008907, over 3035167.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:37,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3180853.3333333335, ans=0.0 2023-11-27 18:24:39,145 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:24:41,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3180853.3333333335, ans=0.2 2023-11-27 18:24:50,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-27 18:24:56,935 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-27 18:25:16,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3181053.3333333335, ans=0.0 2023-11-27 18:25:31,876 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8250, loss[loss=0.05412, simple_loss=0.06701, pruned_loss=0.009276, audio_tagging_loss=0.01134, over 14229.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08916, pruned_loss=0.01244, audio_tagging_loss=0.008771, over 3037775.11 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:25:54,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-27 18:26:12,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3181386.6666666665, ans=0.1 2023-11-27 18:26:24,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.573e+01 9.130e+01 1.008e+02 1.998e+02, threshold=1.826e+02, percent-clipped=1.0 2023-11-27 18:26:29,285 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8300, loss[loss=0.0751, simple_loss=0.09961, pruned_loss=0.01904, audio_tagging_loss=0.006253, over 14871.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08988, pruned_loss=0.01256, audio_tagging_loss=0.008671, over 3042278.73 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:26:33,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3181520.0, ans=0.125 2023-11-27 18:26:45,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2023-11-27 18:26:49,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-27 18:26:50,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3181586.6666666665, ans=0.5 2023-11-27 18:26:52,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-27 18:26:56,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3181653.3333333335, ans=0.07 2023-11-27 18:27:21,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3181786.6666666665, ans=0.125 2023-11-27 18:27:21,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3181786.6666666665, ans=0.125 2023-11-27 18:27:23,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3181786.6666666665, ans=0.0 2023-11-27 18:27:26,435 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8350, loss[loss=0.06542, simple_loss=0.09098, pruned_loss=0.0134, audio_tagging_loss=0.006523, over 14649.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09079, pruned_loss=0.01276, audio_tagging_loss=0.008554, over 3037130.34 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:27:43,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3181920.0, ans=0.1 2023-11-27 18:27:48,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-27 18:27:49,242 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-27 18:28:09,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3182053.3333333335, ans=0.035 2023-11-27 18:28:17,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3182120.0, ans=0.125 2023-11-27 18:28:19,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.748e+01 9.541e+01 1.013e+02 1.320e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 18:28:22,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3182186.6666666665, ans=0.2 2023-11-27 18:28:23,394 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8400, loss[loss=0.06606, simple_loss=0.08779, pruned_loss=0.01239, audio_tagging_loss=0.009778, over 16622.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09109, pruned_loss=0.0129, audio_tagging_loss=0.008534, over 3043564.83 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:28:46,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-27 18:28:55,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.81 vs. limit=10.0 2023-11-27 18:29:06,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3182386.6666666665, ans=0.0 2023-11-27 18:29:20,938 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8450, loss[loss=0.0522, simple_loss=0.06603, pruned_loss=0.008339, audio_tagging_loss=0.01084, over 15319.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08987, pruned_loss=0.01262, audio_tagging_loss=0.008661, over 3043387.12 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:29:43,511 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-27 18:29:50,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3182653.3333333335, ans=0.125 2023-11-27 18:30:12,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3182786.6666666665, ans=0.125 2023-11-27 18:30:13,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.652e+01 9.208e+01 1.012e+02 1.151e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 18:30:18,879 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8500, loss[loss=0.05632, simple_loss=0.0778, pruned_loss=0.009314, audio_tagging_loss=0.008109, over 15109.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08927, pruned_loss=0.01252, audio_tagging_loss=0.008791, over 3040614.41 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:30:37,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3182920.0, ans=0.125 2023-11-27 18:30:39,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3182920.0, ans=0.1 2023-11-27 18:30:42,030 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-27 18:30:55,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3183053.3333333335, ans=0.2 2023-11-27 18:31:03,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=22.5 2023-11-27 18:31:16,558 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8550, loss[loss=0.08207, simple_loss=0.1048, pruned_loss=0.0216, audio_tagging_loss=0.008065, over 15444.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08935, pruned_loss=0.01244, audio_tagging_loss=0.008701, over 3044396.93 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:31:24,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3183186.6666666665, ans=0.125 2023-11-27 18:31:39,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-27 18:31:54,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3183386.6666666665, ans=10.0 2023-11-27 18:31:54,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2023-11-27 18:31:59,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3183386.6666666665, ans=0.2 2023-11-27 18:32:09,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.725e+01 9.304e+01 1.021e+02 1.373e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 18:32:13,964 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8600, loss[loss=0.05234, simple_loss=0.06063, pruned_loss=0.01062, audio_tagging_loss=0.0114, over 14663.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08857, pruned_loss=0.01244, audio_tagging_loss=0.008844, over 3046825.39 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:32:24,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.30 vs. limit=12.0 2023-11-27 18:32:33,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3183586.6666666665, ans=0.125 2023-11-27 18:32:36,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-27 18:32:43,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3183653.3333333335, ans=0.1 2023-11-27 18:32:44,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3183653.3333333335, ans=0.04949747468305833 2023-11-27 18:32:59,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-27 18:33:01,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3183786.6666666665, ans=0.0 2023-11-27 18:33:07,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3183786.6666666665, ans=0.125 2023-11-27 18:33:11,371 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8650, loss[loss=0.04874, simple_loss=0.0634, pruned_loss=0.008503, audio_tagging_loss=0.008541, over 14469.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08894, pruned_loss=0.01255, audio_tagging_loss=0.008852, over 3045341.15 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:33:17,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183853.3333333335, ans=0.1 2023-11-27 18:33:23,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3183920.0, ans=0.0 2023-11-27 18:33:24,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3183920.0, ans=0.125 2023-11-27 18:33:26,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3183920.0, ans=0.125 2023-11-27 18:33:28,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3183920.0, ans=0.125 2023-11-27 18:33:34,132 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-27 18:34:02,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3184120.0, ans=0.0 2023-11-27 18:34:04,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.946e+01 9.500e+01 1.005e+02 1.406e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 18:34:05,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3184120.0, ans=0.125 2023-11-27 18:34:08,451 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8700, loss[loss=0.07889, simple_loss=0.1116, pruned_loss=0.01388, audio_tagging_loss=0.009221, over 15406.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08982, pruned_loss=0.01276, audio_tagging_loss=0.008895, over 3050580.22 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:34:14,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-27 18:34:31,877 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-27 18:35:05,919 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8750, loss[loss=0.07933, simple_loss=0.1085, pruned_loss=0.01728, audio_tagging_loss=0.007787, over 15087.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09125, pruned_loss=0.01285, audio_tagging_loss=0.00888, over 3050115.18 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:35:18,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3184586.6666666665, ans=0.2 2023-11-27 18:35:23,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3184586.6666666665, ans=0.125 2023-11-27 18:35:28,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-27 18:35:32,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3184653.3333333335, ans=0.0 2023-11-27 18:35:35,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3184653.3333333335, ans=0.0 2023-11-27 18:35:41,538 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:35:42,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3184720.0, ans=0.125 2023-11-27 18:35:52,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3184786.6666666665, ans=0.0 2023-11-27 18:35:55,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3184786.6666666665, ans=0.1 2023-11-27 18:35:58,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.815e+01 9.414e+01 9.987e+01 1.374e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 18:36:03,873 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8800, loss[loss=0.06371, simple_loss=0.08154, pruned_loss=0.0136, audio_tagging_loss=0.009336, over 15096.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09063, pruned_loss=0.01275, audio_tagging_loss=0.008977, over 3047691.29 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:36:06,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3184853.3333333335, ans=0.2 2023-11-27 18:36:26,047 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-27 18:36:36,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185053.3333333335, ans=0.1 2023-11-27 18:36:52,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3185120.0, ans=0.1 2023-11-27 18:36:52,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3185120.0, ans=0.125 2023-11-27 18:37:00,076 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8850, loss[loss=0.07874, simple_loss=0.1131, pruned_loss=0.01512, audio_tagging_loss=0.007097, over 14444.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09099, pruned_loss=0.01278, audio_tagging_loss=0.009014, over 3046891.07 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:37:00,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-27 18:37:14,840 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:37:19,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3185253.3333333335, ans=0.0 2023-11-27 18:37:21,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3185253.3333333335, ans=0.125 2023-11-27 18:37:23,595 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-27 18:37:54,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.632e+01 9.430e+01 1.040e+02 1.292e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 18:37:57,545 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8900, loss[loss=0.08632, simple_loss=0.1227, pruned_loss=0.01987, audio_tagging_loss=0.005093, over 15551.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09138, pruned_loss=0.01276, audio_tagging_loss=0.008819, over 3053099.27 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:37:59,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3185520.0, ans=0.0 2023-11-27 18:38:11,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-27 18:38:20,414 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-27 18:38:22,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3185653.3333333335, ans=0.125 2023-11-27 18:38:47,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3185786.6666666665, ans=0.0 2023-11-27 18:38:48,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3185786.6666666665, ans=0.2 2023-11-27 18:38:54,430 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8950, loss[loss=0.08623, simple_loss=0.1186, pruned_loss=0.02102, audio_tagging_loss=0.005897, over 14485.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09088, pruned_loss=0.0126, audio_tagging_loss=0.008722, over 3050947.84 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:38:55,740 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:39:16,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-27 18:39:18,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3185986.6666666665, ans=0.125 2023-11-27 18:39:24,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2023-11-27 18:39:32,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3186053.3333333335, ans=0.0 2023-11-27 18:39:49,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.925e+01 9.376e+01 9.837e+01 1.193e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:39:49,873 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:39:51,774 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9000, loss[loss=0.07416, simple_loss=0.1081, pruned_loss=0.01245, audio_tagging_loss=0.007628, over 15698.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.0908, pruned_loss=0.01258, audio_tagging_loss=0.008676, over 3052926.81 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:39:51,775 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 18:40:08,105 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.1600, 4.4313, 4.5127, 4.3492], device='cuda:3') 2023-11-27 18:40:27,258 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05837, simple_loss=0.05058, pruned_loss=0.005173, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 18:40:27,259 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 18:40:50,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-27 18:40:50,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3186320.0, ans=0.0 2023-11-27 18:41:07,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3186386.6666666665, ans=0.125 2023-11-27 18:41:17,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3186453.3333333335, ans=0.125 2023-11-27 18:41:22,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3186453.3333333335, ans=0.0 2023-11-27 18:41:23,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=22.5 2023-11-27 18:41:25,253 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9050, loss[loss=0.07826, simple_loss=0.1081, pruned_loss=0.01754, audio_tagging_loss=0.006681, over 14458.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09165, pruned_loss=0.01273, audio_tagging_loss=0.008626, over 3056998.06 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 4.0 2023-11-27 18:41:27,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3186520.0, ans=0.125 2023-11-27 18:41:30,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3186520.0, ans=15.0 2023-11-27 18:41:39,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3186586.6666666665, ans=0.2 2023-11-27 18:41:45,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-27 18:41:47,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-27 18:42:14,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3186786.6666666665, ans=0.2 2023-11-27 18:42:21,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.889e+01 9.370e+01 1.013e+02 1.191e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 18:42:22,733 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9100, loss[loss=0.07126, simple_loss=0.1022, pruned_loss=0.01308, audio_tagging_loss=0.00708, over 15757.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.0919, pruned_loss=0.01284, audio_tagging_loss=0.00853, over 3058432.97 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:42:24,027 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:42:26,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2023-11-27 18:42:45,843 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-27 18:42:50,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2023-11-27 18:43:02,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-27 18:43:07,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3187053.3333333335, ans=0.2 2023-11-27 18:43:07,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3187053.3333333335, ans=0.0 2023-11-27 18:43:12,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187120.0, ans=0.1 2023-11-27 18:43:20,517 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9150, loss[loss=0.05189, simple_loss=0.06882, pruned_loss=0.007407, audio_tagging_loss=0.01008, over 14289.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09077, pruned_loss=0.01278, audio_tagging_loss=0.008621, over 3049982.72 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:43:33,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187253.3333333335, ans=0.1 2023-11-27 18:43:36,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-27 18:43:44,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-27 18:44:04,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187386.6666666665, ans=0.1 2023-11-27 18:44:08,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2023-11-27 18:44:11,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3187453.3333333335, ans=0.125 2023-11-27 18:44:12,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187453.3333333335, ans=0.1 2023-11-27 18:44:17,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.554e+01 9.287e+01 9.975e+01 1.548e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:44:18,384 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9200, loss[loss=0.06395, simple_loss=0.0785, pruned_loss=0.01427, audio_tagging_loss=0.01044, over 16095.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0908, pruned_loss=0.01279, audio_tagging_loss=0.008596, over 3046647.07 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:44:19,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3187520.0, ans=0.125 2023-11-27 18:44:26,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3187520.0, ans=0.0 2023-11-27 18:44:39,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-11-27 18:44:40,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-27 18:44:53,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3187720.0, ans=0.125 2023-11-27 18:45:05,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3187786.6666666665, ans=0.125 2023-11-27 18:45:13,739 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:45:13,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3187786.6666666665, ans=0.05 2023-11-27 18:45:15,782 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9250, loss[loss=0.07499, simple_loss=0.1072, pruned_loss=0.01148, audio_tagging_loss=0.009935, over 15415.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09007, pruned_loss=0.01264, audio_tagging_loss=0.008558, over 3049668.69 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:45:16,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-11-27 18:45:23,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2023-11-27 18:45:25,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3187920.0, ans=0.125 2023-11-27 18:45:30,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187920.0, ans=0.1 2023-11-27 18:45:38,980 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-27 18:46:03,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-27 18:46:09,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3188120.0, ans=0.125 2023-11-27 18:46:09,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-11-27 18:46:12,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.841e+01 9.296e+01 9.979e+01 1.330e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-27 18:46:13,712 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9300, loss[loss=0.08262, simple_loss=0.1243, pruned_loss=0.01428, audio_tagging_loss=0.006183, over 15618.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09, pruned_loss=0.01246, audio_tagging_loss=0.00853, over 3046403.03 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:46:18,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3188186.6666666665, ans=0.07 2023-11-27 18:46:24,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3188253.3333333335, ans=0.0 2023-11-27 18:46:26,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-11-27 18:46:32,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3188253.3333333335, ans=0.125 2023-11-27 18:46:37,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-27 18:46:43,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3188320.0, ans=0.125 2023-11-27 18:46:48,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-27 18:46:49,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2023-11-27 18:47:11,350 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9350, loss[loss=0.07027, simple_loss=0.09089, pruned_loss=0.01398, audio_tagging_loss=0.01084, over 14654.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09024, pruned_loss=0.01255, audio_tagging_loss=0.00859, over 3050926.95 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:47:13,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3188520.0, ans=0.025 2023-11-27 18:47:19,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3188520.0, ans=0.125 2023-11-27 18:47:31,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3188586.6666666665, ans=0.025 2023-11-27 18:47:32,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-27 18:47:34,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-27 18:47:39,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3188653.3333333335, ans=0.125 2023-11-27 18:47:50,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3188720.0, ans=0.0 2023-11-27 18:48:08,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.625e+01 9.314e+01 1.018e+02 1.859e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 18:48:09,689 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9400, loss[loss=0.07378, simple_loss=0.1064, pruned_loss=0.01154, audio_tagging_loss=0.009034, over 16096.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08998, pruned_loss=0.01241, audio_tagging_loss=0.008746, over 3051981.98 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:48:17,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3188853.3333333335, ans=0.07 2023-11-27 18:48:17,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188853.3333333335, ans=0.1 2023-11-27 18:48:25,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-27 18:48:31,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-27 18:48:32,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-27 18:49:07,274 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9450, loss[loss=0.04453, simple_loss=0.04928, pruned_loss=0.005693, audio_tagging_loss=0.0142, over 15062.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0898, pruned_loss=0.01231, audio_tagging_loss=0.008864, over 3049400.09 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:49:08,433 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:49:11,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3189186.6666666665, ans=0.09899494936611666 2023-11-27 18:49:17,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-27 18:49:23,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3189253.3333333335, ans=0.125 2023-11-27 18:49:30,394 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-27 18:49:30,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189320.0, ans=0.1 2023-11-27 18:49:30,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3189320.0, ans=0.0 2023-11-27 18:49:41,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3189386.6666666665, ans=0.0 2023-11-27 18:49:51,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3189386.6666666665, ans=0.125 2023-11-27 18:49:53,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3189453.3333333335, ans=0.0 2023-11-27 18:49:54,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3189453.3333333335, ans=0.2 2023-11-27 18:49:58,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3189453.3333333335, ans=0.2 2023-11-27 18:50:04,748 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.634e+01 9.375e+01 9.974e+01 1.335e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:50:04,774 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9500, loss[loss=0.07623, simple_loss=0.1082, pruned_loss=0.01322, audio_tagging_loss=0.008904, over 15183.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08972, pruned_loss=0.0124, audio_tagging_loss=0.008953, over 3038080.77 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:50:27,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3189653.3333333335, ans=0.2 2023-11-27 18:50:28,185 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-27 18:50:28,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=15.0 2023-11-27 18:50:57,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3189786.6666666665, ans=0.2 2023-11-27 18:51:02,302 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9550, loss[loss=0.0599, simple_loss=0.08628, pruned_loss=0.009183, audio_tagging_loss=0.007575, over 13805.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08982, pruned_loss=0.01246, audio_tagging_loss=0.009044, over 3032834.39 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:51:02,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3189853.3333333335, ans=0.5 2023-11-27 18:51:04,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3189853.3333333335, ans=0.125 2023-11-27 18:51:05,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3189853.3333333335, ans=0.125 2023-11-27 18:51:08,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-27 18:51:22,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3189920.0, ans=0.0 2023-11-27 18:51:26,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-27 18:51:45,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3190053.3333333335, ans=0.0 2023-11-27 18:51:59,904 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.341e+01 9.036e+01 9.952e+01 1.407e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 18:51:59,930 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9600, loss[loss=0.0661, simple_loss=0.09195, pruned_loss=0.01288, audio_tagging_loss=0.007251, over 15423.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09055, pruned_loss=0.01258, audio_tagging_loss=0.009078, over 3038861.52 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:52:23,547 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-27 18:52:51,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3190453.3333333335, ans=0.125 2023-11-27 18:52:55,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3190453.3333333335, ans=0.05 2023-11-27 18:52:58,202 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9650, loss[loss=0.05761, simple_loss=0.07728, pruned_loss=0.009904, audio_tagging_loss=0.009064, over 14515.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0905, pruned_loss=0.01264, audio_tagging_loss=0.009096, over 3038489.36 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:53:00,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3190520.0, ans=0.2 2023-11-27 18:53:06,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3190520.0, ans=0.125 2023-11-27 18:53:20,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-27 18:53:33,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3190720.0, ans=0.0 2023-11-27 18:53:37,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3190720.0, ans=0.125 2023-11-27 18:53:46,831 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:53:49,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3190786.6666666665, ans=0.1 2023-11-27 18:53:52,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190786.6666666665, ans=0.1 2023-11-27 18:53:55,997 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.654e+01 9.418e+01 1.007e+02 1.330e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 18:53:56,022 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9700, loss[loss=0.06729, simple_loss=0.1004, pruned_loss=0.008222, audio_tagging_loss=0.008893, over 15755.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.0905, pruned_loss=0.01247, audio_tagging_loss=0.008912, over 3041193.05 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:54:13,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3190920.0, ans=0.125 2023-11-27 18:54:18,993 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-27 18:54:22,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-27 18:54:42,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3191120.0, ans=0.125 2023-11-27 18:54:43,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3191120.0, ans=0.2 2023-11-27 18:54:52,950 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9750, loss[loss=0.06557, simple_loss=0.0882, pruned_loss=0.01177, audio_tagging_loss=0.009698, over 15032.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09052, pruned_loss=0.01261, audio_tagging_loss=0.00884, over 3039100.41 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:54:53,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3191186.6666666665, ans=0.125 2023-11-27 18:55:01,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3191186.6666666665, ans=0.125 2023-11-27 18:55:11,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3191253.3333333335, ans=0.125 2023-11-27 18:55:16,947 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-27 18:55:20,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3191320.0, ans=0.2 2023-11-27 18:55:31,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3191386.6666666665, ans=0.2 2023-11-27 18:55:51,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.742e+01 9.201e+01 9.783e+01 1.182e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 18:55:51,128 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9800, loss[loss=0.0752, simple_loss=0.1075, pruned_loss=0.01267, audio_tagging_loss=0.008769, over 14518.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08902, pruned_loss=0.01232, audio_tagging_loss=0.00889, over 3036919.67 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:55:54,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3191520.0, ans=0.0 2023-11-27 18:55:56,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191520.0, ans=0.1 2023-11-27 18:56:01,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3191586.6666666665, ans=0.1 2023-11-27 18:56:04,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3191586.6666666665, ans=0.1 2023-11-27 18:56:13,836 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-27 18:56:39,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3191786.6666666665, ans=0.125 2023-11-27 18:56:45,434 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:56:48,638 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9850, loss[loss=0.06741, simple_loss=0.09649, pruned_loss=0.01264, audio_tagging_loss=0.006532, over 15254.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09038, pruned_loss=0.01253, audio_tagging_loss=0.008829, over 3046665.30 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:56:54,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-27 18:56:56,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2023-11-27 18:57:00,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3191920.0, ans=0.125 2023-11-27 18:57:01,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-11-27 18:57:07,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-27 18:57:08,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3191920.0, ans=0.2 2023-11-27 18:57:11,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-27 18:57:16,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-27 18:57:28,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-27 18:57:33,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3192120.0, ans=0.125 2023-11-27 18:57:35,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2023-11-27 18:57:45,659 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.871e+01 9.511e+01 1.009e+02 1.336e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 18:57:45,685 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9900, loss[loss=0.06827, simple_loss=0.09121, pruned_loss=0.01488, audio_tagging_loss=0.007782, over 15540.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.0909, pruned_loss=0.01263, audio_tagging_loss=0.008736, over 3049391.53 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:57:59,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2023-11-27 18:58:03,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3192253.3333333335, ans=0.0 2023-11-27 18:58:08,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-27 18:58:09,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-27 18:58:15,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3192320.0, ans=0.125 2023-11-27 18:58:43,976 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9950, loss[loss=0.06401, simple_loss=0.09012, pruned_loss=0.0107, audio_tagging_loss=0.008256, over 16144.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09001, pruned_loss=0.01244, audio_tagging_loss=0.008725, over 3048187.65 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:58:45,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3192520.0, ans=0.0 2023-11-27 18:58:49,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-27 18:58:55,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3192586.6666666665, ans=0.125 2023-11-27 18:59:06,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-27 18:59:27,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-11-27 18:59:35,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3192786.6666666665, ans=0.125 2023-11-27 18:59:41,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.506e+01 9.259e+01 9.823e+01 1.115e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 18:59:41,497 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10000, loss[loss=0.05587, simple_loss=0.07542, pruned_loss=0.008275, audio_tagging_loss=0.009892, over 16725.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08953, pruned_loss=0.01238, audio_tagging_loss=0.008665, over 3045513.29 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 18:59:43,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-27 18:59:50,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-11-27 19:00:00,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3192920.0, ans=0.2 2023-11-27 19:00:04,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-27 19:00:07,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3192986.6666666665, ans=0.125 2023-11-27 19:00:12,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2023-11-27 19:00:15,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3193053.3333333335, ans=0.0 2023-11-27 19:00:22,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-27 19:00:23,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-27 19:00:38,024 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10050, loss[loss=0.06109, simple_loss=0.08329, pruned_loss=0.009419, audio_tagging_loss=0.01002, over 15381.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09, pruned_loss=0.01255, audio_tagging_loss=0.008655, over 3048172.38 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:00:56,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-27 19:01:01,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-27 19:01:29,048 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:01:32,833 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:01:35,903 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10100, loss[loss=0.06602, simple_loss=0.08914, pruned_loss=0.01276, audio_tagging_loss=0.008687, over 13975.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09094, pruned_loss=0.01265, audio_tagging_loss=0.008722, over 3047291.83 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:01:36,987 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.749e+01 9.301e+01 1.017e+02 1.197e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 19:01:59,523 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-27 19:02:03,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2023-11-27 19:02:12,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3193720.0, ans=0.125 2023-11-27 19:02:18,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3193720.0, ans=0.125 2023-11-27 19:02:25,395 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:02:25,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3193786.6666666665, ans=10.0 2023-11-27 19:02:33,754 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10150, loss[loss=0.05834, simple_loss=0.08427, pruned_loss=0.008915, audio_tagging_loss=0.007293, over 15516.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09036, pruned_loss=0.01249, audio_tagging_loss=0.008734, over 3045949.73 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:02:38,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3193853.3333333335, ans=0.09899494936611666 2023-11-27 19:02:39,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3193853.3333333335, ans=0.09899494936611666 2023-11-27 19:02:49,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=3193920.0, ans=15.0 2023-11-27 19:02:52,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3193920.0, ans=0.125 2023-11-27 19:02:56,489 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-27 19:02:57,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3193986.6666666665, ans=0.0 2023-11-27 19:03:03,367 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:03:23,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2023-11-27 19:03:24,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3194120.0, ans=0.125 2023-11-27 19:03:31,551 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10200, loss[loss=0.05706, simple_loss=0.07467, pruned_loss=0.01112, audio_tagging_loss=0.008599, over 15989.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08889, pruned_loss=0.01229, audio_tagging_loss=0.008883, over 3047347.75 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:03:32,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.681e+01 9.288e+01 9.961e+01 1.325e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 19:03:54,391 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-27 19:03:57,176 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:04:28,872 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10250, loss[loss=0.06515, simple_loss=0.07937, pruned_loss=0.01395, audio_tagging_loss=0.01151, over 14858.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08842, pruned_loss=0.01231, audio_tagging_loss=0.008933, over 3043242.91 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:04:39,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3194586.6666666665, ans=0.125 2023-11-27 19:04:43,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2023-11-27 19:04:44,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3194586.6666666665, ans=0.0 2023-11-27 19:04:52,721 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-27 19:05:04,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3194720.0, ans=0.0 2023-11-27 19:05:04,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3194720.0, ans=0.0 2023-11-27 19:05:20,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3194786.6666666665, ans=0.125 2023-11-27 19:05:27,479 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10300, loss[loss=0.0717, simple_loss=0.1008, pruned_loss=0.01287, audio_tagging_loss=0.008411, over 14689.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08862, pruned_loss=0.01237, audio_tagging_loss=0.008967, over 3046417.02 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:05:28,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.814e+01 9.491e+01 9.959e+01 1.329e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:05:39,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:40,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:46,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:47,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3194920.0, ans=0.1 2023-11-27 19:05:50,036 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-27 19:05:57,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3194986.6666666665, ans=0.125 2023-11-27 19:06:02,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-27 19:06:06,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-27 19:06:24,329 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10350, loss[loss=0.06383, simple_loss=0.08058, pruned_loss=0.01228, audio_tagging_loss=0.01126, over 16013.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08899, pruned_loss=0.01248, audio_tagging_loss=0.008939, over 3047198.01 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:06:31,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-27 19:06:47,554 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-27 19:07:03,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3195386.6666666665, ans=0.0 2023-11-27 19:07:15,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2023-11-27 19:07:21,712 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10400, loss[loss=0.07365, simple_loss=0.1044, pruned_loss=0.01485, audio_tagging_loss=0.006602, over 15754.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08914, pruned_loss=0.01255, audio_tagging_loss=0.009061, over 3044206.67 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:07:24,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.831e+01 9.287e+01 1.004e+02 1.358e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 19:07:45,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-27 19:08:04,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3195720.0, ans=0.125 2023-11-27 19:08:07,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-27 19:08:14,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3195786.6666666665, ans=0.125 2023-11-27 19:08:19,753 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10450, loss[loss=0.07094, simple_loss=0.09257, pruned_loss=0.01694, audio_tagging_loss=0.007717, over 14513.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08908, pruned_loss=0.01241, audio_tagging_loss=0.009107, over 3043562.53 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:08:26,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3195853.3333333335, ans=0.125 2023-11-27 19:08:30,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3195853.3333333335, ans=0.2 2023-11-27 19:08:32,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3195920.0, ans=0.05 2023-11-27 19:08:43,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-27 19:08:51,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3195986.6666666665, ans=0.0 2023-11-27 19:08:56,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3196053.3333333335, ans=0.125 2023-11-27 19:09:18,580 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10500, loss[loss=0.06852, simple_loss=0.09331, pruned_loss=0.01387, audio_tagging_loss=0.007992, over 16175.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08972, pruned_loss=0.01244, audio_tagging_loss=0.00884, over 3045864.45 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:09:20,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.582e+01 9.246e+01 1.004e+02 1.274e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 19:09:25,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3196186.6666666665, ans=0.0 2023-11-27 19:09:30,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-27 19:09:34,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-27 19:09:41,843 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-27 19:09:52,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-11-27 19:09:55,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=12.0 2023-11-27 19:09:58,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3196386.6666666665, ans=0.5 2023-11-27 19:10:07,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-27 19:10:10,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-27 19:10:12,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3196453.3333333335, ans=0.2 2023-11-27 19:10:16,046 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10550, loss[loss=0.05754, simple_loss=0.08086, pruned_loss=0.007933, audio_tagging_loss=0.009173, over 15870.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08937, pruned_loss=0.01237, audio_tagging_loss=0.00881, over 3037235.36 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:10:23,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3196520.0, ans=0.125 2023-11-27 19:10:24,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3196520.0, ans=0.0 2023-11-27 19:10:27,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3196586.6666666665, ans=0.09899494936611666 2023-11-27 19:10:39,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-27 19:10:58,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-27 19:11:13,701 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10600, loss[loss=0.06414, simple_loss=0.09522, pruned_loss=0.01081, audio_tagging_loss=0.005727, over 16147.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08934, pruned_loss=0.01235, audio_tagging_loss=0.008768, over 3038707.85 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:11:15,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.628e+01 9.441e+01 1.014e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 19:11:16,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3196853.3333333335, ans=0.125 2023-11-27 19:11:26,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3196920.0, ans=0.07 2023-11-27 19:11:36,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3196986.6666666665, ans=0.1 2023-11-27 19:11:36,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3196986.6666666665, ans=0.125 2023-11-27 19:11:36,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-27 19:11:53,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-27 19:11:55,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.16 vs. limit=10.0 2023-11-27 19:12:07,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3197120.0, ans=0.0 2023-11-27 19:12:09,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3197120.0, ans=0.0 2023-11-27 19:12:11,333 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10650, loss[loss=0.08241, simple_loss=0.1181, pruned_loss=0.01498, audio_tagging_loss=0.008352, over 14693.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08879, pruned_loss=0.01221, audio_tagging_loss=0.008754, over 3036467.93 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:12:16,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-27 19:12:34,478 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-27 19:12:34,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3197320.0, ans=0.0 2023-11-27 19:13:08,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-11-27 19:13:09,081 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10700, loss[loss=0.05965, simple_loss=0.0876, pruned_loss=0.009737, audio_tagging_loss=0.006108, over 14384.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08911, pruned_loss=0.01226, audio_tagging_loss=0.008795, over 3037259.67 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:13:11,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.717e+01 9.252e+01 9.839e+01 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 19:13:16,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2023-11-27 19:13:21,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3197586.6666666665, ans=0.125 2023-11-27 19:13:23,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3197586.6666666665, ans=0.125 2023-11-27 19:13:32,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-27 19:13:40,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3197653.3333333335, ans=0.1 2023-11-27 19:13:41,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3197653.3333333335, ans=0.125 2023-11-27 19:13:41,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197653.3333333335, ans=0.1 2023-11-27 19:13:42,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3197720.0, ans=0.1 2023-11-27 19:13:46,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3197720.0, ans=0.125 2023-11-27 19:13:55,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3197786.6666666665, ans=0.125 2023-11-27 19:14:01,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-27 19:14:06,955 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10750, loss[loss=0.07291, simple_loss=0.0992, pruned_loss=0.01466, audio_tagging_loss=0.008647, over 15601.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08908, pruned_loss=0.01228, audio_tagging_loss=0.008714, over 3045910.10 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:14:15,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=22.5 2023-11-27 19:14:19,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3197920.0, ans=0.0 2023-11-27 19:14:23,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3197920.0, ans=0.0 2023-11-27 19:14:29,487 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-27 19:14:32,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3197986.6666666665, ans=0.0 2023-11-27 19:14:36,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3197986.6666666665, ans=0.0 2023-11-27 19:14:55,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3198120.0, ans=0.125 2023-11-27 19:15:04,568 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10800, loss[loss=0.05943, simple_loss=0.07626, pruned_loss=0.01173, audio_tagging_loss=0.009561, over 14266.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08883, pruned_loss=0.01224, audio_tagging_loss=0.008797, over 3040967.43 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:15:06,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.494e+01 9.274e+01 9.978e+01 1.190e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 19:15:07,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3198186.6666666665, ans=0.09899494936611666 2023-11-27 19:15:11,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3198186.6666666665, ans=0.125 2023-11-27 19:15:12,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198186.6666666665, ans=0.1 2023-11-27 19:15:27,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-27 19:15:44,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3198386.6666666665, ans=0.125 2023-11-27 19:15:53,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2023-11-27 19:16:00,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3198520.0, ans=0.0 2023-11-27 19:16:02,281 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10850, loss[loss=0.05982, simple_loss=0.0769, pruned_loss=0.01135, audio_tagging_loss=0.01002, over 14860.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08876, pruned_loss=0.0123, audio_tagging_loss=0.008791, over 3039633.51 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:16:16,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3198586.6666666665, ans=0.0 2023-11-27 19:16:25,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-27 19:16:34,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-27 19:16:46,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-27 19:16:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3198786.6666666665, ans=0.0 2023-11-27 19:16:54,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-27 19:17:00,009 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10900, loss[loss=0.05897, simple_loss=0.0824, pruned_loss=0.00972, audio_tagging_loss=0.008052, over 15567.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08908, pruned_loss=0.01245, audio_tagging_loss=0.008704, over 3045067.86 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:17:00,039 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:17:02,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.922e+01 9.500e+01 1.014e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 19:17:04,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3198853.3333333335, ans=0.07 2023-11-27 19:17:08,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198853.3333333335, ans=0.1 2023-11-27 19:17:22,605 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-27 19:17:43,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2023-11-27 19:17:57,550 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10950, loss[loss=0.07533, simple_loss=0.1071, pruned_loss=0.01502, audio_tagging_loss=0.006772, over 15306.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08955, pruned_loss=0.01259, audio_tagging_loss=0.008772, over 3042273.23 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:04,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3199186.6666666665, ans=0.125 2023-11-27 19:18:20,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-27 19:18:25,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3199320.0, ans=0.125 2023-11-27 19:18:30,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-27 19:18:32,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3199386.6666666665, ans=0.0 2023-11-27 19:18:41,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3199386.6666666665, ans=0.025 2023-11-27 19:18:42,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3199453.3333333335, ans=0.04949747468305833 2023-11-27 19:18:51,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-27 19:18:52,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3199453.3333333335, ans=0.2 2023-11-27 19:18:54,493 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11000, loss[loss=0.0649, simple_loss=0.08252, pruned_loss=0.01266, audio_tagging_loss=0.01098, over 14180.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.0891, pruned_loss=0.01257, audio_tagging_loss=0.008968, over 3041824.57 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:54,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3199520.0, ans=0.0 2023-11-27 19:18:57,744 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.669e+01 9.375e+01 1.024e+02 1.386e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 19:19:04,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3199520.0, ans=0.0 2023-11-27 19:19:07,841 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:19:18,250 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-27 19:19:26,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3199653.3333333335, ans=0.07 2023-11-27 19:19:33,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2023-11-27 19:19:38,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3199720.0, ans=0.0 2023-11-27 19:19:51,837 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11050, loss[loss=0.05955, simple_loss=0.07767, pruned_loss=0.009485, audio_tagging_loss=0.01123, over 16473.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08962, pruned_loss=0.01264, audio_tagging_loss=0.008957, over 3045978.25 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:19:53,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-27 19:20:02,599 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:20:15,013 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-27 19:20:27,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3200053.3333333335, ans=0.5 2023-11-27 19:20:27,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-27 19:20:33,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3200053.3333333335, ans=0.125 2023-11-27 19:20:34,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3200053.3333333335, ans=0.0 2023-11-27 19:20:51,371 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11100, loss[loss=0.05782, simple_loss=0.07865, pruned_loss=0.008747, audio_tagging_loss=0.009749, over 14573.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09041, pruned_loss=0.0127, audio_tagging_loss=0.008987, over 3054683.79 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:20:51,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3200186.6666666665, ans=0.1 2023-11-27 19:20:56,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.813e+01 9.363e+01 1.015e+02 1.283e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 19:21:13,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-27 19:21:18,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3200320.0, ans=0.0 2023-11-27 19:21:31,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2023-11-27 19:21:34,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3200386.6666666665, ans=0.05 2023-11-27 19:21:39,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3200453.3333333335, ans=0.2 2023-11-27 19:21:49,150 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11150, loss[loss=0.0727, simple_loss=0.09777, pruned_loss=0.01447, audio_tagging_loss=0.009346, over 14677.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08993, pruned_loss=0.01262, audio_tagging_loss=0.009095, over 3055524.16 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:21:54,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3200520.0, ans=0.125 2023-11-27 19:22:12,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-27 19:22:14,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-27 19:22:34,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3200786.6666666665, ans=0.05 2023-11-27 19:22:38,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3200786.6666666665, ans=10.0 2023-11-27 19:22:46,468 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11200, loss[loss=0.07353, simple_loss=0.1077, pruned_loss=0.01098, audio_tagging_loss=0.008678, over 17382.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08958, pruned_loss=0.01248, audio_tagging_loss=0.009156, over 3050285.04 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:22:51,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.755e+01 9.520e+01 1.002e+02 1.290e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 19:23:08,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3200920.0, ans=0.0 2023-11-27 19:23:10,195 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-27 19:23:12,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3200986.6666666665, ans=0.125 2023-11-27 19:23:20,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3201053.3333333335, ans=0.125 2023-11-27 19:23:44,366 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11250, loss[loss=0.07027, simple_loss=0.102, pruned_loss=0.01165, audio_tagging_loss=0.007608, over 15927.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08996, pruned_loss=0.01242, audio_tagging_loss=0.009059, over 3053531.27 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:23:53,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3201186.6666666665, ans=0.2 2023-11-27 19:24:07,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-27 19:24:08,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2023-11-27 19:24:10,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3201320.0, ans=0.125 2023-11-27 19:24:16,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3201320.0, ans=0.125 2023-11-27 19:24:30,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3201453.3333333335, ans=0.0 2023-11-27 19:24:42,361 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11300, loss[loss=0.07403, simple_loss=0.112, pruned_loss=0.01356, audio_tagging_loss=0.004465, over 16457.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08992, pruned_loss=0.01242, audio_tagging_loss=0.008944, over 3048013.70 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:24:47,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.774e+01 9.523e+01 1.010e+02 1.222e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 19:24:52,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3201586.6666666665, ans=0.2 2023-11-27 19:24:52,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3201586.6666666665, ans=0.2 2023-11-27 19:25:03,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3201586.6666666665, ans=0.125 2023-11-27 19:25:04,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3201653.3333333335, ans=0.0 2023-11-27 19:25:05,801 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-27 19:25:07,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-27 19:25:09,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3201653.3333333335, ans=0.0 2023-11-27 19:25:10,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3201653.3333333335, ans=0.125 2023-11-27 19:25:13,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3201653.3333333335, ans=0.125 2023-11-27 19:25:16,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3201720.0, ans=0.125 2023-11-27 19:25:29,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2023-11-27 19:25:39,715 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11350, loss[loss=0.09203, simple_loss=0.1289, pruned_loss=0.0222, audio_tagging_loss=0.005392, over 15637.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08975, pruned_loss=0.01253, audio_tagging_loss=0.008817, over 3048921.62 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:25:46,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-27 19:25:46,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-27 19:25:50,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3201920.0, ans=0.125 2023-11-27 19:25:54,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3201920.0, ans=0.0 2023-11-27 19:26:00,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3201920.0, ans=0.0 2023-11-27 19:26:02,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3201986.6666666665, ans=0.2 2023-11-27 19:26:03,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-27 19:26:06,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-27 19:26:15,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-27 19:26:37,701 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11400, loss[loss=0.06076, simple_loss=0.08717, pruned_loss=0.01034, audio_tagging_loss=0.00684, over 16014.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08935, pruned_loss=0.01243, audio_tagging_loss=0.00871, over 3048387.42 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:26:43,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.795e+01 9.431e+01 1.004e+02 1.426e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 19:26:50,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-27 19:27:00,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-27 19:27:09,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.58 vs. limit=10.0 2023-11-27 19:27:32,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-27 19:27:35,260 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11450, loss[loss=0.07076, simple_loss=0.0893, pruned_loss=0.01553, audio_tagging_loss=0.01059, over 14981.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08969, pruned_loss=0.01241, audio_tagging_loss=0.008702, over 3053527.11 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:27:40,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-27 19:27:45,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3202586.6666666665, ans=0.125 2023-11-27 19:27:46,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3202586.6666666665, ans=0.125 2023-11-27 19:27:48,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3202586.6666666665, ans=0.5 2023-11-27 19:27:57,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-27 19:28:22,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2023-11-27 19:28:27,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3202786.6666666665, ans=0.125 2023-11-27 19:28:32,566 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11500, loss[loss=0.04833, simple_loss=0.06342, pruned_loss=0.007501, audio_tagging_loss=0.009124, over 15464.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08883, pruned_loss=0.01238, audio_tagging_loss=0.008781, over 3050276.52 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:28:32,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3202853.3333333335, ans=0.125 2023-11-27 19:28:38,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 9.056e+01 9.508e+01 1.039e+02 1.307e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 19:28:44,673 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:28:50,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3202920.0, ans=0.0 2023-11-27 19:28:53,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=22.5 2023-11-27 19:28:53,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=22.5 2023-11-27 19:28:56,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-27 19:29:04,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3202986.6666666665, ans=0.125 2023-11-27 19:29:22,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3203120.0, ans=0.0 2023-11-27 19:29:30,419 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11550, loss[loss=0.07147, simple_loss=0.09986, pruned_loss=0.01441, audio_tagging_loss=0.007132, over 15081.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08917, pruned_loss=0.01254, audio_tagging_loss=0.008733, over 3048730.85 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:29:31,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.16 vs. limit=22.5 2023-11-27 19:29:36,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3203186.6666666665, ans=0.125 2023-11-27 19:29:40,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3203186.6666666665, ans=0.1 2023-11-27 19:29:53,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-27 19:29:55,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3203320.0, ans=0.125 2023-11-27 19:29:59,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3203320.0, ans=0.125 2023-11-27 19:30:00,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3203320.0, ans=0.07 2023-11-27 19:30:04,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3203386.6666666665, ans=0.125 2023-11-27 19:30:05,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3203386.6666666665, ans=0.125 2023-11-27 19:30:09,432 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:30:28,571 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11600, loss[loss=0.07222, simple_loss=0.1013, pruned_loss=0.01334, audio_tagging_loss=0.008239, over 16407.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08956, pruned_loss=0.01254, audio_tagging_loss=0.008608, over 3050036.23 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:30:33,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.753e+01 9.625e+01 1.023e+02 1.677e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 19:30:44,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3203586.6666666665, ans=0.025 2023-11-27 19:30:44,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3203586.6666666665, ans=0.125 2023-11-27 19:30:45,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3203586.6666666665, ans=0.125 2023-11-27 19:30:50,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-27 19:31:00,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3203653.3333333335, ans=0.2 2023-11-27 19:31:09,685 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:31:12,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-27 19:31:24,674 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11650, loss[loss=0.06971, simple_loss=0.09539, pruned_loss=0.01197, audio_tagging_loss=0.01004, over 14139.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08904, pruned_loss=0.0124, audio_tagging_loss=0.008632, over 3045098.41 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:31:43,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3203920.0, ans=0.125 2023-11-27 19:31:46,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3203986.6666666665, ans=0.025 2023-11-27 19:31:47,489 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-27 19:31:51,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-11-27 19:32:13,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3204120.0, ans=0.0 2023-11-27 19:32:22,775 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11700, loss[loss=0.07819, simple_loss=0.1082, pruned_loss=0.01691, audio_tagging_loss=0.00719, over 16076.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.0889, pruned_loss=0.01245, audio_tagging_loss=0.008728, over 3041787.27 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:32:28,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.724e+01 9.258e+01 1.003e+02 1.518e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 19:32:31,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3204186.6666666665, ans=0.125 2023-11-27 19:32:39,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3204253.3333333335, ans=0.125 2023-11-27 19:32:45,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-27 19:32:47,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3204320.0, ans=0.125 2023-11-27 19:32:56,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-27 19:32:58,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3204386.6666666665, ans=0.125 2023-11-27 19:33:09,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3204453.3333333335, ans=0.0 2023-11-27 19:33:20,257 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11750, loss[loss=0.06717, simple_loss=0.08933, pruned_loss=0.01593, audio_tagging_loss=0.006581, over 13958.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08917, pruned_loss=0.01249, audio_tagging_loss=0.008729, over 3040077.34 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:33:31,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3204586.6666666665, ans=0.07 2023-11-27 19:33:39,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3204586.6666666665, ans=0.125 2023-11-27 19:33:43,490 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-27 19:33:45,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3204653.3333333335, ans=0.2 2023-11-27 19:33:47,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3204653.3333333335, ans=0.2 2023-11-27 19:34:18,070 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11800, loss[loss=0.06484, simple_loss=0.08455, pruned_loss=0.01292, audio_tagging_loss=0.00965, over 15533.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08984, pruned_loss=0.0125, audio_tagging_loss=0.008737, over 3043466.88 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:34:23,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.512e+01 9.140e+01 9.806e+01 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 19:34:29,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-11-27 19:34:32,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-27 19:34:40,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-27 19:34:48,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-27 19:34:57,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3205053.3333333335, ans=0.0 2023-11-27 19:35:07,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-27 19:35:12,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3205120.0, ans=0.125 2023-11-27 19:35:15,436 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11850, loss[loss=0.05721, simple_loss=0.07727, pruned_loss=0.009588, audio_tagging_loss=0.008986, over 15620.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08997, pruned_loss=0.01247, audio_tagging_loss=0.008831, over 3046863.83 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:35:20,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3205186.6666666665, ans=0.0 2023-11-27 19:35:37,049 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:35:37,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3205253.3333333335, ans=0.2 2023-11-27 19:35:39,084 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-27 19:35:44,003 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:35:57,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3205386.6666666665, ans=0.0 2023-11-27 19:36:10,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3205453.3333333335, ans=0.0 2023-11-27 19:36:13,821 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11900, loss[loss=0.08025, simple_loss=0.1045, pruned_loss=0.01722, audio_tagging_loss=0.01077, over 15207.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08999, pruned_loss=0.01252, audio_tagging_loss=0.008881, over 3045802.74 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:36:19,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.961e+01 9.645e+01 1.029e+02 1.669e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 19:36:19,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3205520.0, ans=0.2 2023-11-27 19:36:37,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-27 19:36:43,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3205653.3333333335, ans=0.125 2023-11-27 19:37:02,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-27 19:37:05,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3205786.6666666665, ans=0.95 2023-11-27 19:37:08,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-27 19:37:11,380 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11950, loss[loss=0.0603, simple_loss=0.08497, pruned_loss=0.0105, audio_tagging_loss=0.007317, over 15033.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09003, pruned_loss=0.01243, audio_tagging_loss=0.008909, over 3047619.92 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:37:13,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3205853.3333333335, ans=0.2 2023-11-27 19:37:13,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3205853.3333333335, ans=0.125 2023-11-27 19:37:21,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3205920.0, ans=0.0 2023-11-27 19:37:27,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3205920.0, ans=0.0 2023-11-27 19:37:27,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3205920.0, ans=0.125 2023-11-27 19:37:34,356 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-27 19:37:40,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3205986.6666666665, ans=0.0 2023-11-27 19:37:40,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3205986.6666666665, ans=0.0 2023-11-27 19:37:42,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205986.6666666665, ans=0.1 2023-11-27 19:37:45,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3206053.3333333335, ans=0.125 2023-11-27 19:38:03,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-11-27 19:38:07,346 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 12000, loss[loss=0.06077, simple_loss=0.08197, pruned_loss=0.01085, audio_tagging_loss=0.008938, over 15189.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09072, pruned_loss=0.01253, audio_tagging_loss=0.009074, over 3050398.24 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:38:07,347 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 19:38:41,929 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05781, simple_loss=0.05069, pruned_loss=0.005234, audio_tagging_loss=0.02723, over 4681554.00 frames. 2023-11-27 19:38:41,930 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 19:38:47,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.885e+01 9.490e+01 1.034e+02 1.237e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:38:49,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-27 19:38:55,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-27 19:39:02,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-27 19:39:26,308 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 0, loss[loss=0.05578, simple_loss=0.05325, pruned_loss=0.004609, audio_tagging_loss=0.02455, over 15231.00 frames. ], tot_loss[loss=0.05578, simple_loss=0.05325, pruned_loss=0.004609, audio_tagging_loss=0.02455, over 15231.00 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:39:26,309 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 19:40:00,214 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005197, audio_tagging_loss=0.0273, over 4681554.00 frames. 2023-11-27 19:40:00,214 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 19:40:18,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3206426.6666666665, ans=0.05 2023-11-27 19:40:21,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3206426.6666666665, ans=0.0 2023-11-27 19:40:45,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.79 vs. limit=12.0 2023-11-27 19:40:50,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-27 19:40:57,761 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 50, loss[loss=0.05743, simple_loss=0.06207, pruned_loss=0.007919, audio_tagging_loss=0.01847, over 15381.00 frames. ], tot_loss[loss=0.07583, simple_loss=0.09194, pruned_loss=0.01262, audio_tagging_loss=0.01724, over 685444.28 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:40:59,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3206693.3333333335, ans=0.125 2023-11-27 19:41:06,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2023-11-27 19:41:13,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3206760.0, ans=0.1 2023-11-27 19:41:17,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3206760.0, ans=0.1 2023-11-27 19:41:31,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.368e+01 1.003e+02 1.103e+02 1.548e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-27 19:41:32,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3206893.3333333335, ans=0.0 2023-11-27 19:41:35,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3206893.3333333335, ans=0.125 2023-11-27 19:41:42,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=15.0 2023-11-27 19:41:48,694 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-27 19:41:52,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3206960.0, ans=0.125 2023-11-27 19:41:55,739 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 100, loss[loss=0.06985, simple_loss=0.07539, pruned_loss=0.01241, audio_tagging_loss=0.01974, over 14526.00 frames. ], tot_loss[loss=0.07478, simple_loss=0.09137, pruned_loss=0.01248, audio_tagging_loss=0.01662, over 1204284.16 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:41:58,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3207026.6666666665, ans=0.1 2023-11-27 19:42:00,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3207026.6666666665, ans=0.125 2023-11-27 19:42:11,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3207093.3333333335, ans=0.2 2023-11-27 19:42:33,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3207226.6666666665, ans=0.125 2023-11-27 19:42:46,591 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-27 19:42:53,769 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 150, loss[loss=0.06597, simple_loss=0.09437, pruned_loss=0.01015, audio_tagging_loss=0.008636, over 15543.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.09091, pruned_loss=0.01259, audio_tagging_loss=0.0149, over 1607637.17 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:42:57,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3207360.0, ans=0.125 2023-11-27 19:42:59,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-27 19:43:03,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3207360.0, ans=0.125 2023-11-27 19:43:12,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3207426.6666666665, ans=0.125 2023-11-27 19:43:28,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.150e+01 9.036e+01 9.587e+01 1.014e+02 1.345e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 19:43:44,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-27 19:43:51,436 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 200, loss[loss=0.0808, simple_loss=0.1061, pruned_loss=0.01699, audio_tagging_loss=0.01078, over 15308.00 frames. ], tot_loss[loss=0.0707, simple_loss=0.09012, pruned_loss=0.01247, audio_tagging_loss=0.01317, over 1927227.43 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:43:51,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3207693.3333333335, ans=0.0 2023-11-27 19:43:56,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207693.3333333335, ans=0.1 2023-11-27 19:43:59,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3207693.3333333335, ans=0.05 2023-11-27 19:44:00,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3207693.3333333335, ans=0.2 2023-11-27 19:44:02,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2023-11-27 19:44:07,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3207760.0, ans=0.125 2023-11-27 19:44:17,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3207826.6666666665, ans=0.125 2023-11-27 19:44:33,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=22.5 2023-11-27 19:44:42,397 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-27 19:44:49,819 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 250, loss[loss=0.06236, simple_loss=0.08956, pruned_loss=0.008332, audio_tagging_loss=0.009251, over 14944.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.09112, pruned_loss=0.01256, audio_tagging_loss=0.0119, over 2176196.02 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:44:50,058 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:44:53,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2023-11-27 19:45:20,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3208160.0, ans=0.125 2023-11-27 19:45:23,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 9.216e+01 9.866e+01 1.064e+02 1.717e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-27 19:45:28,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3208226.6666666665, ans=0.125 2023-11-27 19:45:31,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-27 19:45:40,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-27 19:45:40,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3208293.3333333335, ans=0.0 2023-11-27 19:45:47,464 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 300, loss[loss=0.06906, simple_loss=0.09766, pruned_loss=0.01221, audio_tagging_loss=0.008013, over 16021.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09123, pruned_loss=0.01257, audio_tagging_loss=0.01101, over 2374479.45 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:02,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3208426.6666666665, ans=0.125 2023-11-27 19:46:06,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-11-27 19:46:10,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2023-11-27 19:46:13,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3208493.3333333335, ans=0.125 2023-11-27 19:46:22,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3208560.0, ans=0.125 2023-11-27 19:46:26,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3208560.0, ans=0.2 2023-11-27 19:46:33,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.34 vs. limit=10.0 2023-11-27 19:46:38,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-27 19:46:44,689 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 350, loss[loss=0.07214, simple_loss=0.09673, pruned_loss=0.0122, audio_tagging_loss=0.01158, over 14624.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09114, pruned_loss=0.01255, audio_tagging_loss=0.01038, over 2524572.79 frames. ], batch size: 52, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:59,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3208760.0, ans=0.0 2023-11-27 19:47:16,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3208826.6666666665, ans=0.0 2023-11-27 19:47:19,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.707e+01 9.326e+01 9.986e+01 1.163e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 19:47:19,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3208893.3333333335, ans=0.0 2023-11-27 19:47:35,509 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-27 19:47:43,135 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 400, loss[loss=0.05391, simple_loss=0.06832, pruned_loss=0.006996, audio_tagging_loss=0.01275, over 16788.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09033, pruned_loss=0.01239, audio_tagging_loss=0.009974, over 2645521.83 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:47:43,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3209026.6666666665, ans=0.2 2023-11-27 19:47:44,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3209026.6666666665, ans=0.125 2023-11-27 19:47:47,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-27 19:48:02,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3209093.3333333335, ans=0.0 2023-11-27 19:48:26,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2023-11-27 19:48:33,298 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-27 19:48:40,635 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 450, loss[loss=0.07084, simple_loss=0.1006, pruned_loss=0.01311, audio_tagging_loss=0.007453, over 14788.00 frames. ], tot_loss[loss=0.067, simple_loss=0.08958, pruned_loss=0.01238, audio_tagging_loss=0.009825, over 2732434.26 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:16,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.566e+01 9.069e+01 9.742e+01 1.634e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 19:49:30,338 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:49:31,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-27 19:49:37,849 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 500, loss[loss=0.07333, simple_loss=0.09156, pruned_loss=0.01495, audio_tagging_loss=0.0126, over 14414.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.08955, pruned_loss=0.0125, audio_tagging_loss=0.009541, over 2804396.32 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:39,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2023-11-27 19:49:59,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3209760.0, ans=0.125 2023-11-27 19:50:04,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3209826.6666666665, ans=0.0 2023-11-27 19:50:28,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-27 19:50:36,103 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 550, loss[loss=0.06348, simple_loss=0.0914, pruned_loss=0.009705, audio_tagging_loss=0.008079, over 15941.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09008, pruned_loss=0.01252, audio_tagging_loss=0.009361, over 2860185.96 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:50:36,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3210026.6666666665, ans=0.125 2023-11-27 19:50:58,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3210160.0, ans=0.125 2023-11-27 19:50:59,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3210160.0, ans=0.0 2023-11-27 19:51:11,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.776e+01 9.400e+01 1.030e+02 1.375e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 19:51:11,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3210226.6666666665, ans=0.125 2023-11-27 19:51:17,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3210226.6666666665, ans=0.125 2023-11-27 19:51:21,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=22.5 2023-11-27 19:51:25,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3210293.3333333335, ans=0.2 2023-11-27 19:51:27,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-27 19:51:33,512 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 600, loss[loss=0.05702, simple_loss=0.07081, pruned_loss=0.01288, audio_tagging_loss=0.008737, over 13449.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0892, pruned_loss=0.01252, audio_tagging_loss=0.009228, over 2899159.45 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:51:37,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3210360.0, ans=10.0 2023-11-27 19:51:45,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3210426.6666666665, ans=0.04949747468305833 2023-11-27 19:52:05,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210493.3333333335, ans=0.1 2023-11-27 19:52:25,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-27 19:52:32,069 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 650, loss[loss=0.06184, simple_loss=0.08841, pruned_loss=0.0102, audio_tagging_loss=0.007445, over 14296.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09014, pruned_loss=0.0127, audio_tagging_loss=0.009134, over 2928101.85 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:52:35,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3210693.3333333335, ans=0.125 2023-11-27 19:52:37,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-27 19:53:09,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.855e+01 9.490e+01 1.019e+02 1.294e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:53:22,835 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-27 19:53:24,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3210960.0, ans=0.07 2023-11-27 19:53:29,903 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 700, loss[loss=0.0462, simple_loss=0.05705, pruned_loss=0.00823, audio_tagging_loss=0.00944, over 15647.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08978, pruned_loss=0.0126, audio_tagging_loss=0.009081, over 2953530.97 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:53:35,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-27 19:53:48,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3211093.3333333335, ans=0.0 2023-11-27 19:53:49,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3211093.3333333335, ans=0.125 2023-11-27 19:53:56,977 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:54:12,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3211226.6666666665, ans=0.015 2023-11-27 19:54:15,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3211293.3333333335, ans=0.0 2023-11-27 19:54:20,575 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-27 19:54:24,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3211293.3333333335, ans=0.2 2023-11-27 19:54:27,670 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 750, loss[loss=0.05856, simple_loss=0.07444, pruned_loss=0.01049, audio_tagging_loss=0.01085, over 13696.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09054, pruned_loss=0.01283, audio_tagging_loss=0.008958, over 2979464.26 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:54:38,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-27 19:54:50,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3211493.3333333335, ans=0.0 2023-11-27 19:54:52,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-27 19:55:04,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.938e+01 9.552e+01 1.040e+02 1.357e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 19:55:19,150 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-27 19:55:25,729 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 800, loss[loss=0.0804, simple_loss=0.1192, pruned_loss=0.01497, audio_tagging_loss=0.00584, over 15878.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09014, pruned_loss=0.01279, audio_tagging_loss=0.008977, over 2993823.61 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:55:28,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2023-11-27 19:55:29,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2023-11-27 19:55:29,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2023-11-27 19:56:11,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3211960.0, ans=0.125 2023-11-27 19:56:16,540 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-27 19:56:22,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:23,306 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 850, loss[loss=0.08552, simple_loss=0.1209, pruned_loss=0.01697, audio_tagging_loss=0.0081, over 14891.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09074, pruned_loss=0.0128, audio_tagging_loss=0.008978, over 3008276.84 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:56:27,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:30,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:56,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3212160.0, ans=0.125 2023-11-27 19:57:00,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.782e+01 9.230e+01 1.009e+02 1.508e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 19:57:10,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3212293.3333333335, ans=0.125 2023-11-27 19:57:14,849 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-27 19:57:21,337 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 900, loss[loss=0.07726, simple_loss=0.1069, pruned_loss=0.01737, audio_tagging_loss=0.006421, over 14680.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09042, pruned_loss=0.01276, audio_tagging_loss=0.009057, over 3014882.92 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:57:34,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3212426.6666666665, ans=0.0 2023-11-27 19:58:07,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3212626.6666666665, ans=0.125 2023-11-27 19:58:12,793 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-27 19:58:19,349 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 950, loss[loss=0.05464, simple_loss=0.0747, pruned_loss=0.00849, audio_tagging_loss=0.008798, over 15464.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09116, pruned_loss=0.01286, audio_tagging_loss=0.008914, over 3024196.98 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:58:31,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3212760.0, ans=0.0 2023-11-27 19:58:39,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-27 19:58:56,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 9.001e+01 9.830e+01 1.057e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 19:59:10,357 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-27 19:59:16,913 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1000, loss[loss=0.0743, simple_loss=0.1046, pruned_loss=0.01517, audio_tagging_loss=0.006827, over 14102.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09061, pruned_loss=0.01277, audio_tagging_loss=0.008806, over 3021791.38 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:59:30,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3213093.3333333335, ans=0.5 2023-11-27 19:59:37,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=22.5 2023-11-27 19:59:38,018 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:59:44,554 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:59:47,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2023-11-27 19:59:49,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3213160.0, ans=0.125 2023-11-27 20:00:06,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3213293.3333333335, ans=0.0 2023-11-27 20:00:07,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3213293.3333333335, ans=0.1 2023-11-27 20:00:08,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-27 20:00:09,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:09,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:15,083 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1050, loss[loss=0.06182, simple_loss=0.08093, pruned_loss=0.01304, audio_tagging_loss=0.008319, over 15422.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09084, pruned_loss=0.01278, audio_tagging_loss=0.008726, over 3027757.66 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:00:30,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-27 20:00:50,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=22.5 2023-11-27 20:00:51,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.496e+01 9.150e+01 1.002e+02 1.300e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 20:01:03,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3213626.6666666665, ans=0.125 2023-11-27 20:01:04,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3213626.6666666665, ans=0.0 2023-11-27 20:01:05,877 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-27 20:01:07,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3213626.6666666665, ans=0.2 2023-11-27 20:01:13,677 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1100, loss[loss=0.07023, simple_loss=0.09546, pruned_loss=0.01218, audio_tagging_loss=0.01032, over 15513.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09066, pruned_loss=0.0127, audio_tagging_loss=0.008736, over 3037707.56 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:01:15,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=12.0 2023-11-27 20:01:18,144 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:02:04,188 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-27 20:02:08,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.69 vs. limit=15.0 2023-11-27 20:02:10,943 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1150, loss[loss=0.07738, simple_loss=0.1054, pruned_loss=0.01735, audio_tagging_loss=0.007319, over 14929.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09144, pruned_loss=0.0127, audio_tagging_loss=0.008652, over 3042959.24 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:02:32,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3214093.3333333335, ans=0.0 2023-11-27 20:02:33,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3214160.0, ans=0.125 2023-11-27 20:02:33,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-11-27 20:02:48,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.534e+01 9.220e+01 9.959e+01 1.599e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 20:02:49,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-27 20:03:02,485 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-27 20:03:09,504 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1200, loss[loss=0.08764, simple_loss=0.128, pruned_loss=0.01721, audio_tagging_loss=0.006444, over 15341.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09156, pruned_loss=0.01266, audio_tagging_loss=0.008601, over 3052415.83 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:03:17,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3214360.0, ans=0.2 2023-11-27 20:03:24,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-27 20:03:29,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-27 20:03:33,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-27 20:03:47,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3214560.0, ans=0.0 2023-11-27 20:03:58,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3214626.6666666665, ans=0.2 2023-11-27 20:03:59,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3214626.6666666665, ans=0.09899494936611666 2023-11-27 20:04:00,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-27 20:04:07,630 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1250, loss[loss=0.0595, simple_loss=0.0787, pruned_loss=0.00931, audio_tagging_loss=0.01084, over 15072.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09111, pruned_loss=0.01265, audio_tagging_loss=0.008574, over 3046707.83 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:04:33,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-27 20:04:44,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.597e+01 9.538e+01 1.019e+02 1.522e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 20:04:58,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-27 20:05:04,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3215026.6666666665, ans=0.125 2023-11-27 20:05:05,191 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1300, loss[loss=0.04761, simple_loss=0.0612, pruned_loss=0.00815, audio_tagging_loss=0.008858, over 15309.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.091, pruned_loss=0.01259, audio_tagging_loss=0.008562, over 3045157.39 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:05:13,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3215026.6666666665, ans=0.0 2023-11-27 20:05:36,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3215160.0, ans=0.125 2023-11-27 20:05:36,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3215160.0, ans=0.1 2023-11-27 20:05:55,738 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-27 20:06:03,057 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1350, loss[loss=0.06527, simple_loss=0.08902, pruned_loss=0.01254, audio_tagging_loss=0.00822, over 14668.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09005, pruned_loss=0.01241, audio_tagging_loss=0.00864, over 3047528.14 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:06:06,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3215360.0, ans=0.125 2023-11-27 20:06:14,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3215426.6666666665, ans=0.09899494936611666 2023-11-27 20:06:22,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3215426.6666666665, ans=0.05 2023-11-27 20:06:22,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-11-27 20:06:25,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3215493.3333333335, ans=0.125 2023-11-27 20:06:32,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-27 20:06:40,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.606e+01 9.244e+01 9.716e+01 1.166e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:06:40,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3215560.0, ans=0.1 2023-11-27 20:06:47,026 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:06:52,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-27 20:06:53,627 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-27 20:06:54,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3215626.6666666665, ans=0.0 2023-11-27 20:07:00,782 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1400, loss[loss=0.04899, simple_loss=0.07487, pruned_loss=0.004807, audio_tagging_loss=0.00675, over 14673.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08968, pruned_loss=0.01233, audio_tagging_loss=0.008727, over 3045616.41 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:07:10,010 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:07:14,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3215760.0, ans=0.0 2023-11-27 20:07:20,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3215760.0, ans=6.0 2023-11-27 20:07:30,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3215826.6666666665, ans=0.2 2023-11-27 20:07:32,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3215826.6666666665, ans=0.0 2023-11-27 20:07:40,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215893.3333333335, ans=0.1 2023-11-27 20:07:40,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-27 20:07:44,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3215893.3333333335, ans=0.0 2023-11-27 20:07:51,634 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-27 20:07:58,321 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1450, loss[loss=0.06739, simple_loss=0.09581, pruned_loss=0.01139, audio_tagging_loss=0.008097, over 15137.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09043, pruned_loss=0.01249, audio_tagging_loss=0.008814, over 3051403.59 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:07:58,559 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:08:23,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3216160.0, ans=0.2 2023-11-27 20:08:26,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3216160.0, ans=0.09899494936611666 2023-11-27 20:08:26,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-27 20:08:36,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.746e+01 9.275e+01 1.017e+02 1.401e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 20:08:41,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2023-11-27 20:08:49,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-27 20:08:56,206 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1500, loss[loss=0.05616, simple_loss=0.06826, pruned_loss=0.01009, audio_tagging_loss=0.01194, over 14699.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09022, pruned_loss=0.0125, audio_tagging_loss=0.008904, over 3040552.57 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:12,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3216426.6666666665, ans=0.0 2023-11-27 20:09:13,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3216426.6666666665, ans=0.0 2023-11-27 20:09:14,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-11-27 20:09:21,365 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:09:33,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3216560.0, ans=0.2 2023-11-27 20:09:47,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-27 20:09:50,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3216626.6666666665, ans=0.0 2023-11-27 20:09:50,832 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:09:53,904 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1550, loss[loss=0.08057, simple_loss=0.1198, pruned_loss=0.01522, audio_tagging_loss=0.00543, over 14650.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09044, pruned_loss=0.01249, audio_tagging_loss=0.008938, over 3046393.81 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:56,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-27 20:10:08,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3216760.0, ans=0.125 2023-11-27 20:10:13,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3216760.0, ans=0.2 2023-11-27 20:10:22,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-27 20:10:23,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3216826.6666666665, ans=0.0 2023-11-27 20:10:32,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.858e+01 9.389e+01 9.907e+01 1.182e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 20:10:33,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3216893.3333333335, ans=0.125 2023-11-27 20:10:45,411 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-27 20:10:51,983 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1600, loss[loss=0.06429, simple_loss=0.08462, pruned_loss=0.01176, audio_tagging_loss=0.01022, over 15315.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08997, pruned_loss=0.01247, audio_tagging_loss=0.009007, over 3044817.00 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:10:59,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217026.6666666665, ans=0.1 2023-11-27 20:11:42,424 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-27 20:11:49,921 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1650, loss[loss=0.06419, simple_loss=0.08144, pruned_loss=0.01226, audio_tagging_loss=0.01121, over 15130.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08995, pruned_loss=0.01245, audio_tagging_loss=0.008986, over 3039857.13 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:11:50,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3217360.0, ans=0.07 2023-11-27 20:12:03,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.89 vs. limit=10.0 2023-11-27 20:12:08,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3217426.6666666665, ans=0.1 2023-11-27 20:12:14,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3217493.3333333335, ans=0.125 2023-11-27 20:12:27,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.730e+01 9.445e+01 1.002e+02 1.391e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 20:12:35,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-27 20:12:38,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3217626.6666666665, ans=0.04949747468305833 2023-11-27 20:12:40,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-27 20:12:44,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3217626.6666666665, ans=0.125 2023-11-27 20:12:47,280 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1700, loss[loss=0.06219, simple_loss=0.08938, pruned_loss=0.01019, audio_tagging_loss=0.007307, over 15841.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08929, pruned_loss=0.01232, audio_tagging_loss=0.009031, over 3039448.18 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:13:00,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3217760.0, ans=0.2 2023-11-27 20:13:03,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-27 20:13:21,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3217893.3333333335, ans=0.125 2023-11-27 20:13:28,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3217893.3333333335, ans=0.2 2023-11-27 20:13:31,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3217893.3333333335, ans=0.2 2023-11-27 20:13:38,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-27 20:13:38,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3217960.0, ans=0.2 2023-11-27 20:13:43,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3217960.0, ans=0.125 2023-11-27 20:13:45,107 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1750, loss[loss=0.06244, simple_loss=0.07869, pruned_loss=0.01231, audio_tagging_loss=0.01078, over 14868.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08924, pruned_loss=0.01227, audio_tagging_loss=0.008961, over 3047975.31 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:13:48,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3218026.6666666665, ans=0.0 2023-11-27 20:13:51,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2023-11-27 20:14:04,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3218093.3333333335, ans=0.125 2023-11-27 20:14:22,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218226.6666666665, ans=0.1 2023-11-27 20:14:23,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.743e+01 9.232e+01 9.959e+01 1.189e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 20:14:34,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3218293.3333333335, ans=0.1 2023-11-27 20:14:35,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-27 20:14:40,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-27 20:14:42,277 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1800, loss[loss=0.05345, simple_loss=0.07414, pruned_loss=0.008341, audio_tagging_loss=0.008042, over 15353.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08894, pruned_loss=0.01226, audio_tagging_loss=0.008863, over 3046162.63 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:15:16,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-27 20:15:33,365 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-27 20:15:37,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3218626.6666666665, ans=0.1 2023-11-27 20:15:38,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-27 20:15:40,752 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1850, loss[loss=0.03691, simple_loss=0.05424, pruned_loss=0.003971, audio_tagging_loss=0.005816, over 14409.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0885, pruned_loss=0.01227, audio_tagging_loss=0.008839, over 3045915.75 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:16:01,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3218760.0, ans=0.0 2023-11-27 20:16:18,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.712e+01 9.397e+01 9.825e+01 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:16:23,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3218893.3333333335, ans=0.1 2023-11-27 20:16:28,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3218960.0, ans=0.5 2023-11-27 20:16:31,970 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-27 20:16:38,531 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1900, loss[loss=0.06486, simple_loss=0.08379, pruned_loss=0.01248, audio_tagging_loss=0.01049, over 15141.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08907, pruned_loss=0.01247, audio_tagging_loss=0.008903, over 3047378.99 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:16:45,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-27 20:17:05,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3219160.0, ans=0.0 2023-11-27 20:17:12,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-27 20:17:12,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2023-11-27 20:17:17,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3219226.6666666665, ans=0.1 2023-11-27 20:17:29,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-27 20:17:35,808 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1950, loss[loss=0.06484, simple_loss=0.08444, pruned_loss=0.01153, audio_tagging_loss=0.01109, over 14830.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08885, pruned_loss=0.01235, audio_tagging_loss=0.00888, over 3044356.93 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:17:41,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-27 20:17:54,162 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:17:56,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3219426.6666666665, ans=0.2 2023-11-27 20:18:15,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.669e+01 9.288e+01 9.966e+01 1.212e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 20:18:27,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-27 20:18:32,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3219626.6666666665, ans=0.125 2023-11-27 20:18:34,221 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2000, loss[loss=0.09346, simple_loss=0.135, pruned_loss=0.01799, audio_tagging_loss=0.007979, over 14474.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08881, pruned_loss=0.01259, audio_tagging_loss=0.00885, over 3041080.37 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:18:34,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3219693.3333333335, ans=0.0 2023-11-27 20:18:34,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3219693.3333333335, ans=0.0 2023-11-27 20:18:47,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-27 20:19:03,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3219826.6666666665, ans=0.125 2023-11-27 20:19:10,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3219893.3333333335, ans=0.0 2023-11-27 20:19:12,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2023-11-27 20:19:25,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-27 20:19:29,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-11-27 20:19:32,544 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2050, loss[loss=0.07037, simple_loss=0.09574, pruned_loss=0.01461, audio_tagging_loss=0.007895, over 13675.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08924, pruned_loss=0.01265, audio_tagging_loss=0.00876, over 3042577.65 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:19:43,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3220093.3333333335, ans=0.1 2023-11-27 20:19:51,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3220093.3333333335, ans=0.0 2023-11-27 20:20:00,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3220160.0, ans=0.0 2023-11-27 20:20:12,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.893e+01 9.583e+01 1.011e+02 1.256e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 20:20:13,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-27 20:20:16,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-11-27 20:20:22,987 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-27 20:20:29,580 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2100, loss[loss=0.07061, simple_loss=0.09342, pruned_loss=0.01419, audio_tagging_loss=0.00971, over 16725.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09034, pruned_loss=0.01278, audio_tagging_loss=0.008616, over 3047014.58 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:20:45,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3220426.6666666665, ans=0.125 2023-11-27 20:20:54,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=22.5 2023-11-27 20:20:57,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3220493.3333333335, ans=0.125 2023-11-27 20:21:13,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3220560.0, ans=0.125 2023-11-27 20:21:20,566 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-27 20:21:27,427 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2150, loss[loss=0.06634, simple_loss=0.08922, pruned_loss=0.01141, audio_tagging_loss=0.01031, over 14266.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09042, pruned_loss=0.01263, audio_tagging_loss=0.008601, over 3046080.97 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:21:28,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3220693.3333333335, ans=0.125 2023-11-27 20:21:34,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-27 20:21:55,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2023-11-27 20:22:03,880 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:22:07,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.704e+01 9.254e+01 9.792e+01 1.378e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 20:22:13,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3220960.0, ans=0.0 2023-11-27 20:22:17,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-27 20:22:25,331 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2200, loss[loss=0.07495, simple_loss=0.09687, pruned_loss=0.01447, audio_tagging_loss=0.01205, over 15073.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09047, pruned_loss=0.0126, audio_tagging_loss=0.008663, over 3047494.11 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:22:33,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-27 20:22:34,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3221026.6666666665, ans=0.125 2023-11-27 20:22:43,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221093.3333333335, ans=0.1 2023-11-27 20:22:44,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3221093.3333333335, ans=0.125 2023-11-27 20:22:52,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221160.0, ans=0.1 2023-11-27 20:22:53,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-27 20:22:58,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3221226.6666666665, ans=0.07 2023-11-27 20:23:16,025 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-27 20:23:19,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3221293.3333333335, ans=0.2 2023-11-27 20:23:23,023 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2250, loss[loss=0.07831, simple_loss=0.1005, pruned_loss=0.01792, audio_tagging_loss=0.01016, over 15338.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09006, pruned_loss=0.01268, audio_tagging_loss=0.008753, over 3049284.38 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:23:26,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-27 20:23:49,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3221493.3333333335, ans=0.0 2023-11-27 20:24:01,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3221560.0, ans=0.125 2023-11-27 20:24:03,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.930e+01 9.422e+01 1.015e+02 1.618e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 20:24:13,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3221626.6666666665, ans=0.2 2023-11-27 20:24:14,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-27 20:24:15,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3221626.6666666665, ans=0.125 2023-11-27 20:24:21,363 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2300, loss[loss=0.09082, simple_loss=0.1241, pruned_loss=0.02274, audio_tagging_loss=0.006033, over 16803.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09003, pruned_loss=0.0126, audio_tagging_loss=0.008735, over 3049516.70 frames. ], batch size: 63, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:24:32,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3221760.0, ans=0.125 2023-11-27 20:24:47,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-27 20:25:03,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3221893.3333333335, ans=0.0 2023-11-27 20:25:11,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-27 20:25:14,153 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:25:16,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3221960.0, ans=0.125 2023-11-27 20:25:19,106 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2350, loss[loss=0.04326, simple_loss=0.04877, pruned_loss=0.006007, audio_tagging_loss=0.01286, over 14457.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08935, pruned_loss=0.01249, audio_tagging_loss=0.008833, over 3045045.21 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:25:55,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-27 20:25:56,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-27 20:25:58,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.811e+01 9.279e+01 1.007e+02 1.436e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 20:26:09,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-27 20:26:12,700 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:26:15,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3222360.0, ans=0.0 2023-11-27 20:26:16,846 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2400, loss[loss=0.07008, simple_loss=0.0959, pruned_loss=0.01439, audio_tagging_loss=0.007744, over 15468.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08957, pruned_loss=0.01256, audio_tagging_loss=0.008965, over 3047020.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:26:17,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3222360.0, ans=0.0 2023-11-27 20:26:25,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3222360.0, ans=0.2 2023-11-27 20:26:34,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3222426.6666666665, ans=0.0 2023-11-27 20:26:36,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-27 20:26:49,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3222493.3333333335, ans=0.2 2023-11-27 20:27:00,309 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:27:03,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3222626.6666666665, ans=0.125 2023-11-27 20:27:07,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-27 20:27:15,126 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2450, loss[loss=0.06986, simple_loss=0.1019, pruned_loss=0.01188, audio_tagging_loss=0.007046, over 14985.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08951, pruned_loss=0.01253, audio_tagging_loss=0.00906, over 3045279.01 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:27:22,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2023-11-27 20:27:56,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.599e+01 9.201e+01 9.948e+01 1.437e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 20:27:58,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3222893.3333333335, ans=0.1 2023-11-27 20:28:05,348 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:28:06,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-27 20:28:12,712 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2500, loss[loss=0.06199, simple_loss=0.06937, pruned_loss=0.01499, audio_tagging_loss=0.01231, over 15140.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08905, pruned_loss=0.01252, audio_tagging_loss=0.00918, over 3040467.15 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:28:55,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3223226.6666666665, ans=0.125 2023-11-27 20:29:04,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-27 20:29:10,718 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2550, loss[loss=0.07867, simple_loss=0.1146, pruned_loss=0.01353, audio_tagging_loss=0.007855, over 15369.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08859, pruned_loss=0.01247, audio_tagging_loss=0.009052, over 3044920.73 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:29:18,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2023-11-27 20:29:36,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3223493.3333333335, ans=0.125 2023-11-27 20:29:46,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3223560.0, ans=0.125 2023-11-27 20:29:51,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2023-11-27 20:29:52,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.657e+01 9.326e+01 1.025e+02 1.204e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 20:29:53,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3223560.0, ans=0.0 2023-11-27 20:29:54,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3223560.0, ans=0.0 2023-11-27 20:30:01,928 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-27 20:30:08,842 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2600, loss[loss=0.05526, simple_loss=0.07553, pruned_loss=0.009685, audio_tagging_loss=0.00781, over 15170.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08893, pruned_loss=0.01253, audio_tagging_loss=0.008884, over 3045159.91 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:30:16,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223693.3333333335, ans=0.1 2023-11-27 20:30:18,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3223693.3333333335, ans=22.5 2023-11-27 20:30:56,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3223960.0, ans=0.0 2023-11-27 20:30:57,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3223960.0, ans=0.125 2023-11-27 20:30:59,538 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-27 20:31:06,354 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2650, loss[loss=0.06996, simple_loss=0.1093, pruned_loss=0.008649, audio_tagging_loss=0.006655, over 15558.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08909, pruned_loss=0.01254, audio_tagging_loss=0.008824, over 3044515.38 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:31:31,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3224160.0, ans=0.125 2023-11-27 20:31:46,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3224226.6666666665, ans=0.0 2023-11-27 20:31:48,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.676e+01 9.510e+01 9.992e+01 1.225e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 20:31:54,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3224293.3333333335, ans=0.125 2023-11-27 20:31:55,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224293.3333333335, ans=0.1 2023-11-27 20:31:57,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-27 20:32:04,334 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2700, loss[loss=0.06897, simple_loss=0.09178, pruned_loss=0.01312, audio_tagging_loss=0.009961, over 14887.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09006, pruned_loss=0.0127, audio_tagging_loss=0.008697, over 3047090.52 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:32:05,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3224360.0, ans=0.0 2023-11-27 20:32:21,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3224426.6666666665, ans=0.125 2023-11-27 20:32:48,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3224560.0, ans=0.125 2023-11-27 20:32:55,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-27 20:33:02,273 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2750, loss[loss=0.06117, simple_loss=0.08269, pruned_loss=0.01072, audio_tagging_loss=0.009103, over 13703.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08937, pruned_loss=0.01256, audio_tagging_loss=0.008736, over 3054272.46 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:33:06,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3224693.3333333335, ans=0.2 2023-11-27 20:33:33,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-27 20:33:41,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3224893.3333333335, ans=0.0 2023-11-27 20:33:43,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.556e+01 9.189e+01 9.890e+01 1.172e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 20:33:49,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3224960.0, ans=0.0 2023-11-27 20:33:53,795 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:33:53,830 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-27 20:34:00,309 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2800, loss[loss=0.06879, simple_loss=0.09444, pruned_loss=0.01226, audio_tagging_loss=0.009313, over 14454.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08964, pruned_loss=0.01251, audio_tagging_loss=0.008707, over 3051743.62 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:34:08,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3225026.6666666665, ans=15.0 2023-11-27 20:34:12,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3225093.3333333335, ans=0.1 2023-11-27 20:34:14,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-27 20:34:16,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-27 20:34:26,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3225160.0, ans=0.1 2023-11-27 20:34:29,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3225160.0, ans=0.0 2023-11-27 20:34:35,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-27 20:34:42,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-27 20:34:48,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2023-11-27 20:34:51,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-27 20:34:58,465 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2850, loss[loss=0.09126, simple_loss=0.1226, pruned_loss=0.02259, audio_tagging_loss=0.007367, over 14132.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08998, pruned_loss=0.01264, audio_tagging_loss=0.008643, over 3043929.30 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:35:03,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-27 20:35:19,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3225426.6666666665, ans=0.125 2023-11-27 20:35:21,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-27 20:35:41,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.834e+01 9.311e+01 1.027e+02 1.174e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 20:35:47,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3225626.6666666665, ans=0.2 2023-11-27 20:35:48,733 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-27 20:35:55,316 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2900, loss[loss=0.06086, simple_loss=0.07912, pruned_loss=0.01087, audio_tagging_loss=0.01043, over 15534.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09021, pruned_loss=0.01262, audio_tagging_loss=0.008742, over 3046274.15 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:35:56,675 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:35:59,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3225693.3333333335, ans=0.125 2023-11-27 20:36:04,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3225693.3333333335, ans=0.0 2023-11-27 20:36:10,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3225760.0, ans=0.1 2023-11-27 20:36:46,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-27 20:36:49,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3225960.0, ans=0.1 2023-11-27 20:36:52,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3226026.6666666665, ans=0.2 2023-11-27 20:36:53,780 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2950, loss[loss=0.05331, simple_loss=0.07242, pruned_loss=0.008542, audio_tagging_loss=0.008557, over 14635.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09068, pruned_loss=0.01276, audio_tagging_loss=0.008806, over 3043623.22 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:36:58,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3226026.6666666665, ans=0.125 2023-11-27 20:36:58,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-27 20:37:04,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-27 20:37:12,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3226093.3333333335, ans=0.0 2023-11-27 20:37:13,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226093.3333333335, ans=0.1 2023-11-27 20:37:29,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-27 20:37:31,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3226226.6666666665, ans=0.125 2023-11-27 20:37:36,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.664e+01 9.410e+01 9.930e+01 1.488e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 20:37:44,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-27 20:37:51,784 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3000, loss[loss=0.06931, simple_loss=0.09582, pruned_loss=0.01099, audio_tagging_loss=0.0104, over 15159.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09087, pruned_loss=0.01281, audio_tagging_loss=0.008878, over 3047486.17 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:37:51,784 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 20:38:26,094 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.0572, simple_loss=0.05061, pruned_loss=0.005192, audio_tagging_loss=0.0267, over 4681554.00 frames. 2023-11-27 20:38:26,095 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 20:38:31,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3226360.0, ans=0.125 2023-11-27 20:38:36,102 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:38:57,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-27 20:39:07,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3226560.0, ans=0.0 2023-11-27 20:39:17,141 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-27 20:39:26,481 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3050, loss[loss=0.0637, simple_loss=0.08267, pruned_loss=0.01342, audio_tagging_loss=0.008943, over 15120.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09091, pruned_loss=0.01275, audio_tagging_loss=0.008869, over 3047583.06 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:39:45,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3226760.0, ans=0.125 2023-11-27 20:39:59,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3226826.6666666665, ans=0.125 2023-11-27 20:40:01,234 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:40:09,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.867e+01 9.400e+01 1.012e+02 1.240e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 20:40:17,834 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-27 20:40:17,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3226960.0, ans=0.025 2023-11-27 20:40:18,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3226960.0, ans=0.125 2023-11-27 20:40:24,355 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3100, loss[loss=0.04717, simple_loss=0.06049, pruned_loss=0.005147, audio_tagging_loss=0.01178, over 14878.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09141, pruned_loss=0.01276, audio_tagging_loss=0.008835, over 3055642.43 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:40:28,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3227026.6666666665, ans=0.05 2023-11-27 20:40:28,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-27 20:40:31,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-11-27 20:40:53,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227160.0, ans=0.125 2023-11-27 20:41:04,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3227226.6666666665, ans=0.5 2023-11-27 20:41:05,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-27 20:41:14,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-27 20:41:21,279 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3150, loss[loss=0.07436, simple_loss=0.1192, pruned_loss=0.008808, audio_tagging_loss=0.005953, over 14847.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09176, pruned_loss=0.01297, audio_tagging_loss=0.008941, over 3056529.99 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:41:21,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227360.0, ans=0.125 2023-11-27 20:41:32,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3227426.6666666665, ans=0.125 2023-11-27 20:41:38,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3227426.6666666665, ans=0.025 2023-11-27 20:41:45,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3227493.3333333335, ans=0.125 2023-11-27 20:42:04,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.840e+01 9.395e+01 9.954e+01 1.405e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:42:12,812 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-27 20:42:13,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3227626.6666666665, ans=0.0 2023-11-27 20:42:17,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3227626.6666666665, ans=0.125 2023-11-27 20:42:19,270 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3200, loss[loss=0.06677, simple_loss=0.09321, pruned_loss=0.01163, audio_tagging_loss=0.008529, over 16222.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09189, pruned_loss=0.01291, audio_tagging_loss=0.008902, over 3056670.68 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:42:28,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3227693.3333333335, ans=0.125 2023-11-27 20:42:33,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=12.0 2023-11-27 20:42:34,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3227760.0, ans=0.0 2023-11-27 20:43:10,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-27 20:43:18,039 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3250, loss[loss=0.07537, simple_loss=0.1016, pruned_loss=0.01566, audio_tagging_loss=0.008929, over 15250.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09021, pruned_loss=0.01259, audio_tagging_loss=0.009051, over 3054056.80 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:43:25,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2023-11-27 20:43:43,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3228160.0, ans=0.125 2023-11-27 20:43:46,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3228160.0, ans=0.0 2023-11-27 20:43:46,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3228160.0, ans=0.2 2023-11-27 20:43:50,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3228160.0, ans=0.09899494936611666 2023-11-27 20:43:50,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3228160.0, ans=0.125 2023-11-27 20:43:54,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3228226.6666666665, ans=0.125 2023-11-27 20:44:00,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.543e+01 9.307e+01 1.025e+02 1.528e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 20:44:08,474 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-27 20:44:10,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3228293.3333333335, ans=0.125 2023-11-27 20:44:14,945 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3300, loss[loss=0.07054, simple_loss=0.09768, pruned_loss=0.01208, audio_tagging_loss=0.009618, over 15374.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08944, pruned_loss=0.01242, audio_tagging_loss=0.009179, over 3053683.13 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:44:27,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3228426.6666666665, ans=10.0 2023-11-27 20:44:34,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3228426.6666666665, ans=0.125 2023-11-27 20:44:40,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-27 20:44:41,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2023-11-27 20:44:49,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-27 20:44:54,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3228560.0, ans=0.0 2023-11-27 20:44:59,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-27 20:45:05,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3228626.6666666665, ans=0.0 2023-11-27 20:45:06,260 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-27 20:45:12,792 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3350, loss[loss=0.07287, simple_loss=0.09689, pruned_loss=0.01412, audio_tagging_loss=0.01031, over 16214.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09024, pruned_loss=0.01255, audio_tagging_loss=0.009045, over 3050155.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:45:25,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3228760.0, ans=0.5 2023-11-27 20:45:27,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=22.5 2023-11-27 20:45:30,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=15.0 2023-11-27 20:45:30,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3228760.0, ans=0.5 2023-11-27 20:45:40,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-27 20:45:43,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-27 20:45:55,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.657e+01 9.246e+01 1.011e+02 1.317e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:45:57,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2023-11-27 20:46:02,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-27 20:46:10,125 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3400, loss[loss=0.07183, simple_loss=0.1038, pruned_loss=0.01075, audio_tagging_loss=0.009159, over 13686.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09071, pruned_loss=0.01267, audio_tagging_loss=0.008817, over 3050438.39 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:46:16,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3229026.6666666665, ans=0.2 2023-11-27 20:46:21,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3229093.3333333335, ans=0.0 2023-11-27 20:47:00,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-27 20:47:07,146 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3450, loss[loss=0.05532, simple_loss=0.0722, pruned_loss=0.007802, audio_tagging_loss=0.01141, over 14658.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09067, pruned_loss=0.01262, audio_tagging_loss=0.00874, over 3046673.81 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:47:11,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3229360.0, ans=0.0 2023-11-27 20:47:17,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3229426.6666666665, ans=0.07 2023-11-27 20:47:50,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.393e+01 9.066e+01 9.893e+01 1.377e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-27 20:47:57,368 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-27 20:48:04,357 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3500, loss[loss=0.07273, simple_loss=0.1017, pruned_loss=0.01429, audio_tagging_loss=0.007603, over 14565.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09066, pruned_loss=0.01264, audio_tagging_loss=0.008693, over 3044016.60 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:48:22,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2023-11-27 20:48:27,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3229826.6666666665, ans=0.2 2023-11-27 20:48:30,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3229826.6666666665, ans=0.02 2023-11-27 20:48:33,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-27 20:48:34,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-27 20:48:34,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3229826.6666666665, ans=0.125 2023-11-27 20:48:35,202 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:48:41,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3229893.3333333335, ans=0.0 2023-11-27 20:48:43,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3229893.3333333335, ans=0.0 2023-11-27 20:48:54,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-27 20:48:57,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2023-11-27 20:49:01,727 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3550, loss[loss=0.06101, simple_loss=0.08002, pruned_loss=0.009931, audio_tagging_loss=0.01107, over 15527.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08988, pruned_loss=0.01243, audio_tagging_loss=0.008684, over 3041744.18 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:49:08,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3230026.6666666665, ans=0.125 2023-11-27 20:49:08,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3230026.6666666665, ans=0.0 2023-11-27 20:49:09,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3230026.6666666665, ans=0.125 2023-11-27 20:49:14,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3230093.3333333335, ans=0.125 2023-11-27 20:49:34,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3230160.0, ans=0.09899494936611666 2023-11-27 20:49:39,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3230226.6666666665, ans=0.125 2023-11-27 20:49:45,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.597e+01 9.146e+01 1.002e+02 1.167e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 20:49:46,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-27 20:49:52,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-27 20:49:59,410 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3600, loss[loss=0.06864, simple_loss=0.09452, pruned_loss=0.01122, audio_tagging_loss=0.01016, over 15077.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08914, pruned_loss=0.01241, audio_tagging_loss=0.008628, over 3037383.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:14,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3230426.6666666665, ans=0.125 2023-11-27 20:50:31,490 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:50:38,693 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:50:42,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230560.0, ans=0.1 2023-11-27 20:50:48,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230626.6666666665, ans=0.1 2023-11-27 20:50:49,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-27 20:50:51,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3230626.6666666665, ans=0.0 2023-11-27 20:50:55,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3230626.6666666665, ans=0.0 2023-11-27 20:50:57,223 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3650, loss[loss=0.0553, simple_loss=0.07592, pruned_loss=0.008566, audio_tagging_loss=0.008778, over 15271.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08965, pruned_loss=0.01244, audio_tagging_loss=0.008492, over 3034209.93 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:57,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230693.3333333335, ans=0.1 2023-11-27 20:51:41,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.915e+01 9.732e+01 1.035e+02 1.318e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 20:51:45,468 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:51:47,439 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-27 20:51:51,896 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:51:53,910 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3700, loss[loss=0.04939, simple_loss=0.06048, pruned_loss=0.008595, audio_tagging_loss=0.01056, over 15749.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09025, pruned_loss=0.01259, audio_tagging_loss=0.008475, over 3042806.71 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:52:03,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3231026.6666666665, ans=0.1 2023-11-27 20:52:21,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3231160.0, ans=0.125 2023-11-27 20:52:21,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2023-11-27 20:52:45,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-27 20:52:47,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3231293.3333333335, ans=0.125 2023-11-27 20:52:51,691 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3750, loss[loss=0.07459, simple_loss=0.1093, pruned_loss=0.0136, audio_tagging_loss=0.00633, over 15484.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09064, pruned_loss=0.01262, audio_tagging_loss=0.008488, over 3045124.46 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:12,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3231426.6666666665, ans=0.0 2023-11-27 20:53:19,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3231493.3333333335, ans=10.0 2023-11-27 20:53:22,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-27 20:53:25,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3231560.0, ans=0.125 2023-11-27 20:53:25,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3231560.0, ans=0.125 2023-11-27 20:53:33,162 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:53:36,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.777e+01 9.406e+01 1.027e+02 1.290e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 20:53:42,600 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-27 20:53:48,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-27 20:53:49,563 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3800, loss[loss=0.06154, simple_loss=0.08555, pruned_loss=0.009653, audio_tagging_loss=0.00911, over 14768.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09024, pruned_loss=0.01251, audio_tagging_loss=0.008585, over 3047762.30 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:55,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3231693.3333333335, ans=0.2 2023-11-27 20:53:56,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-11-27 20:54:13,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-27 20:54:34,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3231960.0, ans=0.125 2023-11-27 20:54:39,438 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-27 20:54:46,232 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3850, loss[loss=0.0842, simple_loss=0.1151, pruned_loss=0.01861, audio_tagging_loss=0.008045, over 15865.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09002, pruned_loss=0.01254, audio_tagging_loss=0.008653, over 3050036.95 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:54:58,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232093.3333333335, ans=0.1 2023-11-27 20:55:06,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3232093.3333333335, ans=0.125 2023-11-27 20:55:08,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3232160.0, ans=0.1 2023-11-27 20:55:09,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3232160.0, ans=0.125 2023-11-27 20:55:31,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.660e+01 9.334e+01 1.001e+02 1.347e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 20:55:34,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3232293.3333333335, ans=0.125 2023-11-27 20:55:37,212 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-27 20:55:40,530 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:55:43,719 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3900, loss[loss=0.05029, simple_loss=0.06035, pruned_loss=0.007178, audio_tagging_loss=0.01294, over 14434.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08951, pruned_loss=0.01248, audio_tagging_loss=0.008774, over 3047890.01 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:56:09,792 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:56:31,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3232626.6666666665, ans=0.125 2023-11-27 20:56:34,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-27 20:56:42,119 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3950, loss[loss=0.06338, simple_loss=0.08777, pruned_loss=0.0106, audio_tagging_loss=0.008894, over 15746.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08884, pruned_loss=0.01228, audio_tagging_loss=0.00885, over 3051367.54 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:56:42,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2023-11-27 20:56:42,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2023-11-27 20:56:43,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-27 20:56:50,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-27 20:56:54,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3232760.0, ans=0.0 2023-11-27 20:57:01,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3232760.0, ans=0.0 2023-11-27 20:57:03,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3232760.0, ans=0.125 2023-11-27 20:57:04,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2023-11-27 20:57:17,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-27 20:57:18,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3232893.3333333335, ans=0.0 2023-11-27 20:57:26,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.571e+01 9.462e+01 1.016e+02 1.341e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 20:57:32,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-27 20:57:39,311 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4000, loss[loss=0.08111, simple_loss=0.09774, pruned_loss=0.02322, audio_tagging_loss=0.009019, over 15932.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08979, pruned_loss=0.01241, audio_tagging_loss=0.008765, over 3040805.67 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:58:06,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3233160.0, ans=0.2 2023-11-27 20:58:07,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3233160.0, ans=0.125 2023-11-27 20:58:29,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-27 20:58:36,267 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4050, loss[loss=0.08343, simple_loss=0.1219, pruned_loss=0.01603, audio_tagging_loss=0.006445, over 16004.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08962, pruned_loss=0.0123, audio_tagging_loss=0.008848, over 3038303.84 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:58:40,672 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:58:45,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3233360.0, ans=0.125 2023-11-27 20:58:49,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3233426.6666666665, ans=0.05 2023-11-27 20:58:53,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2023-11-27 20:59:07,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3233493.3333333335, ans=0.0 2023-11-27 20:59:13,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3233560.0, ans=0.1 2023-11-27 20:59:16,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3233560.0, ans=0.125 2023-11-27 20:59:20,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 8.859e+01 9.526e+01 1.036e+02 1.251e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 20:59:25,734 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-27 20:59:26,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3233626.6666666665, ans=0.2 2023-11-27 20:59:32,392 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4100, loss[loss=0.09277, simple_loss=0.1326, pruned_loss=0.01858, audio_tagging_loss=0.007869, over 15190.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09032, pruned_loss=0.01238, audio_tagging_loss=0.008783, over 3045855.48 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:59:32,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3233693.3333333335, ans=0.09899494936611666 2023-11-27 20:59:42,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3233693.3333333335, ans=0.0 2023-11-27 20:59:48,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.05 vs. limit=6.0 2023-11-27 20:59:57,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3233826.6666666665, ans=0.2 2023-11-27 21:00:01,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-27 21:00:16,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3233893.3333333335, ans=0.125 2023-11-27 21:00:23,212 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-27 21:00:30,319 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4150, loss[loss=0.0751, simple_loss=0.1094, pruned_loss=0.01342, audio_tagging_loss=0.006989, over 14814.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09029, pruned_loss=0.01235, audio_tagging_loss=0.008689, over 3044588.34 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:00:37,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3234026.6666666665, ans=0.04949747468305833 2023-11-27 21:00:38,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3234026.6666666665, ans=0.125 2023-11-27 21:01:07,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3234226.6666666665, ans=0.2 2023-11-27 21:01:11,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3234226.6666666665, ans=0.2 2023-11-27 21:01:13,443 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:01:15,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.501e+01 9.252e+01 1.004e+02 1.216e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 21:01:20,617 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-27 21:01:23,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2023-11-27 21:01:27,037 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4200, loss[loss=0.0861, simple_loss=0.1143, pruned_loss=0.02079, audio_tagging_loss=0.008173, over 14581.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09037, pruned_loss=0.01242, audio_tagging_loss=0.008678, over 3040834.01 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:01:27,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3234360.0, ans=0.0 2023-11-27 21:01:30,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3234360.0, ans=0.125 2023-11-27 21:01:34,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3234360.0, ans=0.2 2023-11-27 21:01:39,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-27 21:01:46,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-27 21:02:05,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3234560.0, ans=0.125 2023-11-27 21:02:17,440 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-27 21:02:24,401 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4250, loss[loss=0.06098, simple_loss=0.08005, pruned_loss=0.01002, audio_tagging_loss=0.01093, over 15523.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.0906, pruned_loss=0.01257, audio_tagging_loss=0.00859, over 3041084.19 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:02:48,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3234826.6666666665, ans=0.0 2023-11-27 21:03:10,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.065e+01 9.544e+01 1.011e+02 1.214e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 21:03:15,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-27 21:03:20,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-27 21:03:21,921 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4300, loss[loss=0.07493, simple_loss=0.1112, pruned_loss=0.01451, audio_tagging_loss=0.004818, over 14734.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.0919, pruned_loss=0.01268, audio_tagging_loss=0.008485, over 3042396.41 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:03:27,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3235026.6666666665, ans=0.125 2023-11-27 21:03:57,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=15.0 2023-11-27 21:04:00,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3235226.6666666665, ans=0.1 2023-11-27 21:04:01,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-27 21:04:06,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2023-11-27 21:04:08,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3235293.3333333335, ans=0.125 2023-11-27 21:04:12,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-27 21:04:18,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3235360.0, ans=0.125 2023-11-27 21:04:19,751 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4350, loss[loss=0.0611, simple_loss=0.08402, pruned_loss=0.01079, audio_tagging_loss=0.0083, over 15110.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09225, pruned_loss=0.01286, audio_tagging_loss=0.008402, over 3045873.46 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:04:59,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3235560.0, ans=0.125 2023-11-27 21:05:06,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 9.007e+01 9.649e+01 1.037e+02 1.293e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 21:05:08,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3235626.6666666665, ans=0.125 2023-11-27 21:05:10,087 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-27 21:05:12,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3235626.6666666665, ans=0.125 2023-11-27 21:05:12,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-11-27 21:05:16,686 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4400, loss[loss=0.06051, simple_loss=0.08582, pruned_loss=0.008127, audio_tagging_loss=0.009469, over 16224.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.0915, pruned_loss=0.01278, audio_tagging_loss=0.008411, over 3050935.32 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:05:25,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3235693.3333333335, ans=0.1 2023-11-27 21:06:00,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3235960.0, ans=10.0 2023-11-27 21:06:05,963 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-27 21:06:13,254 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4450, loss[loss=0.07139, simple_loss=0.09378, pruned_loss=0.01465, audio_tagging_loss=0.009853, over 15355.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09207, pruned_loss=0.01286, audio_tagging_loss=0.008477, over 3055388.83 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:06:17,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3236026.6666666665, ans=0.2 2023-11-27 21:06:17,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3236026.6666666665, ans=0.125 2023-11-27 21:06:20,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-27 21:06:42,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3236160.0, ans=0.0 2023-11-27 21:07:00,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.705e+01 9.463e+01 1.011e+02 1.177e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 21:07:03,694 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-27 21:07:11,585 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4500, loss[loss=0.06172, simple_loss=0.08433, pruned_loss=0.01135, audio_tagging_loss=0.008212, over 15861.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09213, pruned_loss=0.01291, audio_tagging_loss=0.00848, over 3057650.78 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:07:21,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=22.5 2023-11-27 21:07:37,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-27 21:07:48,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3236560.0, ans=0.125 2023-11-27 21:07:54,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3236560.0, ans=0.125 2023-11-27 21:07:56,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-27 21:07:57,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-11-27 21:08:01,653 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-27 21:08:08,323 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4550, loss[loss=0.08108, simple_loss=0.1101, pruned_loss=0.01875, audio_tagging_loss=0.007297, over 15242.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09151, pruned_loss=0.01275, audio_tagging_loss=0.008532, over 3049740.88 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:08:10,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3236693.3333333335, ans=0.0 2023-11-27 21:08:12,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3236693.3333333335, ans=0.125 2023-11-27 21:08:16,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3236693.3333333335, ans=0.1 2023-11-27 21:08:18,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3236760.0, ans=0.125 2023-11-27 21:08:18,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-27 21:08:52,652 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:08:53,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-11-27 21:08:54,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 8.582e+01 9.237e+01 9.730e+01 1.372e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 21:08:55,252 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:08:58,309 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-27 21:09:04,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3237026.6666666665, ans=0.125 2023-11-27 21:09:05,343 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4600, loss[loss=0.05713, simple_loss=0.08067, pruned_loss=0.009308, audio_tagging_loss=0.007489, over 15395.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09131, pruned_loss=0.01274, audio_tagging_loss=0.008652, over 3052284.10 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:09:29,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-27 21:09:49,885 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:09:55,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-27 21:10:02,109 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4650, loss[loss=0.06259, simple_loss=0.08525, pruned_loss=0.01209, audio_tagging_loss=0.007874, over 16375.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09085, pruned_loss=0.01269, audio_tagging_loss=0.008753, over 3050763.41 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:10:04,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3237360.0, ans=0.0 2023-11-27 21:10:18,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3237426.6666666665, ans=0.125 2023-11-27 21:10:24,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3237493.3333333335, ans=0.0 2023-11-27 21:10:30,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3237493.3333333335, ans=0.0 2023-11-27 21:10:40,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3237560.0, ans=0.0 2023-11-27 21:10:49,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.818e+01 9.409e+01 9.994e+01 1.817e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 21:10:52,662 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-27 21:10:59,621 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4700, loss[loss=0.07989, simple_loss=0.1082, pruned_loss=0.01688, audio_tagging_loss=0.008907, over 15312.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08976, pruned_loss=0.01257, audio_tagging_loss=0.008943, over 3046646.95 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:11:13,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3237760.0, ans=0.0 2023-11-27 21:11:45,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3237960.0, ans=0.0 2023-11-27 21:11:48,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3237960.0, ans=0.5 2023-11-27 21:11:49,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-27 21:11:56,976 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4750, loss[loss=0.06431, simple_loss=0.08669, pruned_loss=0.01292, audio_tagging_loss=0.008045, over 15315.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08894, pruned_loss=0.01255, audio_tagging_loss=0.00904, over 3042397.77 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:12:00,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3238026.6666666665, ans=0.125 2023-11-27 21:12:02,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3238026.6666666665, ans=0.95 2023-11-27 21:12:17,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3238093.3333333335, ans=0.125 2023-11-27 21:12:43,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.911e+01 9.575e+01 1.033e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 21:12:45,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3238293.3333333335, ans=0.125 2023-11-27 21:12:46,966 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-27 21:12:53,377 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4800, loss[loss=0.07968, simple_loss=0.1127, pruned_loss=0.01525, audio_tagging_loss=0.008065, over 16480.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08901, pruned_loss=0.01251, audio_tagging_loss=0.009061, over 3045084.65 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:13:02,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3238360.0, ans=0.04949747468305833 2023-11-27 21:13:10,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=22.5 2023-11-27 21:13:18,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3238493.3333333335, ans=0.125 2023-11-27 21:13:24,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3238493.3333333335, ans=0.125 2023-11-27 21:13:31,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3238560.0, ans=0.0 2023-11-27 21:13:33,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3238560.0, ans=0.125 2023-11-27 21:13:44,077 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-27 21:13:50,932 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4850, loss[loss=0.06602, simple_loss=0.08232, pruned_loss=0.0148, audio_tagging_loss=0.01005, over 14287.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08974, pruned_loss=0.01244, audio_tagging_loss=0.009003, over 3041031.70 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:14:02,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3238760.0, ans=0.125 2023-11-27 21:14:10,147 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:14:22,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2023-11-27 21:14:39,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.891e+01 9.390e+01 1.010e+02 1.385e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 21:14:42,985 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-27 21:14:47,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3238960.0, ans=0.2 2023-11-27 21:14:49,863 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4900, loss[loss=0.04962, simple_loss=0.05994, pruned_loss=0.0108, audio_tagging_loss=0.00885, over 14222.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09057, pruned_loss=0.01255, audio_tagging_loss=0.008902, over 3039174.03 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:14:55,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3239026.6666666665, ans=0.125 2023-11-27 21:15:07,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3239093.3333333335, ans=0.0 2023-11-27 21:15:16,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-27 21:15:28,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3239226.6666666665, ans=0.125 2023-11-27 21:15:47,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-27 21:15:54,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3239293.3333333335, ans=0.125 2023-11-27 21:15:57,718 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4950, loss[loss=0.07928, simple_loss=0.1068, pruned_loss=0.01708, audio_tagging_loss=0.008816, over 13166.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09, pruned_loss=0.01249, audio_tagging_loss=0.008796, over 3034093.93 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:16:22,131 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:16:30,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3239426.6666666665, ans=0.125 2023-11-27 21:16:31,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3239426.6666666665, ans=0.2 2023-11-27 21:16:32,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3239426.6666666665, ans=0.0 2023-11-27 21:16:34,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3239493.3333333335, ans=0.2 2023-11-27 21:17:30,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.683e+01 9.334e+01 9.956e+01 1.191e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 21:17:36,373 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-27 21:17:36,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3239626.6666666665, ans=0.1 2023-11-27 21:17:38,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239626.6666666665, ans=0.1 2023-11-27 21:17:43,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2023-11-27 21:17:48,894 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5000, loss[loss=0.06296, simple_loss=0.0856, pruned_loss=0.01112, audio_tagging_loss=0.00904, over 15087.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08997, pruned_loss=0.0126, audio_tagging_loss=0.008739, over 3035880.97 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:18:41,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3239826.6666666665, ans=10.0 2023-11-27 21:18:49,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=22.5 2023-11-27 21:19:03,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3239960.0, ans=0.2 2023-11-27 21:19:05,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3239960.0, ans=0.125 2023-11-27 21:19:05,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3239960.0, ans=0.0 2023-11-27 21:19:09,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3239960.0, ans=0.2 2023-11-27 21:19:11,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-27 21:19:13,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3239960.0, ans=0.125 2023-11-27 21:19:13,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239960.0, ans=0.1 2023-11-27 21:19:13,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3239960.0, ans=0.0 2023-11-27 21:19:21,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3240026.6666666665, ans=0.0 2023-11-27 21:19:22,883 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5050, loss[loss=0.053, simple_loss=0.06571, pruned_loss=0.01145, audio_tagging_loss=0.008699, over 15695.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08928, pruned_loss=0.01248, audio_tagging_loss=0.008793, over 3041320.40 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:19:51,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-27 21:20:04,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3240093.3333333335, ans=0.125 2023-11-27 21:22:11,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.669e+01 9.381e+01 9.908e+01 1.305e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 21:22:22,489 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-27 21:22:53,544 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5100, loss[loss=0.07482, simple_loss=0.1045, pruned_loss=0.01493, audio_tagging_loss=0.007633, over 15366.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0891, pruned_loss=0.01241, audio_tagging_loss=0.008794, over 3034717.76 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:23:05,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3240360.0, ans=0.0 2023-11-27 21:25:00,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3240493.3333333335, ans=0.125 2023-11-27 21:25:23,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3240560.0, ans=0.125 2023-11-27 21:25:28,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-27 21:26:14,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-27 21:26:47,643 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5150, loss[loss=0.07099, simple_loss=0.09829, pruned_loss=0.01189, audio_tagging_loss=0.009949, over 15500.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08955, pruned_loss=0.01249, audio_tagging_loss=0.008736, over 3034900.63 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:27:43,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3240760.0, ans=0.125 2023-11-27 21:28:20,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3240826.6666666665, ans=0.2 2023-11-27 21:28:48,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3240826.6666666665, ans=0.0 2023-11-27 21:29:44,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3240893.3333333335, ans=0.1 2023-11-27 21:29:58,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.896e+01 9.394e+01 1.012e+02 1.340e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 21:30:05,422 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-27 21:30:28,620 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5200, loss[loss=0.09353, simple_loss=0.1309, pruned_loss=0.02038, audio_tagging_loss=0.007694, over 15077.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09015, pruned_loss=0.0127, audio_tagging_loss=0.008685, over 3038529.20 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:31:36,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3241093.3333333335, ans=0.125 2023-11-27 21:32:08,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3241160.0, ans=0.125 2023-11-27 21:33:19,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-27 21:33:42,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3241360.0, ans=0.07 2023-11-27 21:33:43,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2023-11-27 21:33:45,340 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5250, loss[loss=0.05427, simple_loss=0.07127, pruned_loss=0.01066, audio_tagging_loss=0.007978, over 14812.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09118, pruned_loss=0.01269, audio_tagging_loss=0.008591, over 3048753.23 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:34:12,413 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:34:48,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3241493.3333333335, ans=0.125 2023-11-27 21:35:20,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3241560.0, ans=0.0 2023-11-27 21:36:09,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.648e+01 9.300e+01 9.886e+01 1.149e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:36:11,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-27 21:36:13,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3241626.6666666665, ans=0.125 2023-11-27 21:36:31,501 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5300, loss[loss=0.0564, simple_loss=0.07097, pruned_loss=0.00927, audio_tagging_loss=0.01164, over 15733.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09142, pruned_loss=0.01262, audio_tagging_loss=0.00853, over 3053072.89 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:36:45,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3241693.3333333335, ans=0.05 2023-11-27 21:37:09,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3241760.0, ans=0.125 2023-11-27 21:37:56,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3241893.3333333335, ans=0.0 2023-11-27 21:38:06,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-27 21:38:27,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241960.0, ans=0.1 2023-11-27 21:38:37,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3241960.0, ans=0.2 2023-11-27 21:38:40,093 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-27 21:38:57,800 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5350, loss[loss=0.04459, simple_loss=0.05443, pruned_loss=0.006698, audio_tagging_loss=0.01067, over 15844.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09153, pruned_loss=0.0127, audio_tagging_loss=0.008568, over 3045045.38 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:39:02,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-27 21:39:28,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3242093.3333333335, ans=0.2 2023-11-27 21:39:53,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3242160.0, ans=0.2 2023-11-27 21:40:18,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3242226.6666666665, ans=0.1 2023-11-27 21:40:57,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.785e+01 8.875e+01 9.269e+01 1.018e+02 1.292e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-27 21:40:59,737 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-27 21:41:13,625 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5400, loss[loss=0.06337, simple_loss=0.07862, pruned_loss=0.01471, audio_tagging_loss=0.009348, over 14940.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09239, pruned_loss=0.01278, audio_tagging_loss=0.008573, over 3050081.44 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:41:14,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3242360.0, ans=0.09899494936611666 2023-11-27 21:41:15,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3242360.0, ans=0.04949747468305833 2023-11-27 21:41:17,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3242360.0, ans=0.125 2023-11-27 21:41:58,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3242426.6666666665, ans=0.125 2023-11-27 21:42:02,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-27 21:42:05,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3242493.3333333335, ans=0.0 2023-11-27 21:42:55,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-11-27 21:43:16,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-27 21:43:26,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3242626.6666666665, ans=0.0 2023-11-27 21:43:34,448 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5450, loss[loss=0.07075, simple_loss=0.0964, pruned_loss=0.01243, audio_tagging_loss=0.01012, over 14221.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09201, pruned_loss=0.01279, audio_tagging_loss=0.008628, over 3050337.79 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:44:11,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3242760.0, ans=0.2 2023-11-27 21:44:21,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3242760.0, ans=0.0 2023-11-27 21:44:22,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3242826.6666666665, ans=0.125 2023-11-27 21:45:09,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3242893.3333333335, ans=10.0 2023-11-27 21:45:25,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.703e+01 9.302e+01 9.943e+01 1.219e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:45:27,685 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-27 21:45:40,827 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5500, loss[loss=0.05187, simple_loss=0.05247, pruned_loss=0.01478, audio_tagging_loss=0.01086, over 14937.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09234, pruned_loss=0.0129, audio_tagging_loss=0.008664, over 3051579.95 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:46:48,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3243160.0, ans=0.125 2023-11-27 21:47:00,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3243226.6666666665, ans=0.0 2023-11-27 21:47:00,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3243226.6666666665, ans=10.0 2023-11-27 21:47:36,454 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-27 21:47:53,110 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5550, loss[loss=0.06856, simple_loss=0.08636, pruned_loss=0.0168, audio_tagging_loss=0.008582, over 14428.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.0914, pruned_loss=0.01285, audio_tagging_loss=0.008754, over 3050680.16 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:48:18,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3243426.6666666665, ans=0.0 2023-11-27 21:48:46,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3243493.3333333335, ans=0.0 2023-11-27 21:49:18,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3243560.0, ans=0.125 2023-11-27 21:49:29,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3243626.6666666665, ans=0.2 2023-11-27 21:49:40,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.844e+01 9.360e+01 9.886e+01 1.640e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 21:49:40,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-27 21:49:41,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3243626.6666666665, ans=0.02 2023-11-27 21:49:53,573 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5600, loss[loss=0.09188, simple_loss=0.139, pruned_loss=0.01556, audio_tagging_loss=0.006818, over 17319.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09202, pruned_loss=0.01295, audio_tagging_loss=0.008909, over 3042177.16 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:50:00,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3243693.3333333335, ans=0.125 2023-11-27 21:50:11,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3243693.3333333335, ans=0.125 2023-11-27 21:50:56,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3243826.6666666665, ans=0.125 2023-11-27 21:51:20,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-27 21:51:25,587 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:51:43,013 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-27 21:51:58,857 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5650, loss[loss=0.05991, simple_loss=0.08824, pruned_loss=0.007585, audio_tagging_loss=0.008198, over 16102.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09137, pruned_loss=0.01263, audio_tagging_loss=0.008858, over 3046034.84 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:52:33,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3244093.3333333335, ans=0.0 2023-11-27 21:52:49,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-27 21:53:32,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.720e+01 9.211e+01 9.882e+01 1.405e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 21:53:32,478 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-27 21:53:41,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3244360.0, ans=0.025 2023-11-27 21:53:42,044 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5700, loss[loss=0.07324, simple_loss=0.09641, pruned_loss=0.01592, audio_tagging_loss=0.009116, over 14697.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09114, pruned_loss=0.01254, audio_tagging_loss=0.008869, over 3048389.41 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:54:03,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3244426.6666666665, ans=0.125 2023-11-27 21:54:22,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3244426.6666666665, ans=0.025 2023-11-27 21:54:30,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3244493.3333333335, ans=0.0 2023-11-27 21:54:38,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-27 21:54:49,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244560.0, ans=0.1 2023-11-27 21:55:16,606 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-27 21:55:29,012 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5750, loss[loss=0.08108, simple_loss=0.1063, pruned_loss=0.01968, audio_tagging_loss=0.008241, over 14607.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09018, pruned_loss=0.01263, audio_tagging_loss=0.008853, over 3043229.93 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:55:42,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3244693.3333333335, ans=0.125 2023-11-27 21:55:43,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3244693.3333333335, ans=0.0 2023-11-27 21:56:32,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3244893.3333333335, ans=0.0 2023-11-27 21:56:55,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.667e+01 9.281e+01 1.002e+02 1.374e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 21:56:55,391 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-27 21:57:08,427 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5800, loss[loss=0.08441, simple_loss=0.1243, pruned_loss=0.01564, audio_tagging_loss=0.006642, over 15965.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09124, pruned_loss=0.01286, audio_tagging_loss=0.008704, over 3043873.94 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:57:08,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3245026.6666666665, ans=0.025 2023-11-27 21:57:45,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3245160.0, ans=0.0 2023-11-27 21:57:57,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3245160.0, ans=0.125 2023-11-27 21:58:24,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3245293.3333333335, ans=0.0 2023-11-27 21:58:30,880 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-27 21:58:35,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3245293.3333333335, ans=0.1 2023-11-27 21:58:42,514 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5850, loss[loss=0.04925, simple_loss=0.06372, pruned_loss=0.01004, audio_tagging_loss=0.007344, over 14743.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09016, pruned_loss=0.0126, audio_tagging_loss=0.008657, over 3045178.03 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:58:44,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.81 vs. limit=10.0 2023-11-27 21:58:44,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-27 21:58:58,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3245360.0, ans=0.125 2023-11-27 21:59:20,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3245493.3333333335, ans=0.1 2023-11-27 22:00:03,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.902e+01 9.558e+01 1.050e+02 1.471e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 22:00:04,103 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-27 22:00:13,962 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5900, loss[loss=0.08258, simple_loss=0.1214, pruned_loss=0.01427, audio_tagging_loss=0.007598, over 15458.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09139, pruned_loss=0.01285, audio_tagging_loss=0.008638, over 3046044.26 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:00:19,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3245693.3333333335, ans=0.0 2023-11-27 22:00:51,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3245826.6666666665, ans=0.125 2023-11-27 22:01:08,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3245893.3333333335, ans=0.1 2023-11-27 22:01:18,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3245960.0, ans=0.5 2023-11-27 22:01:27,191 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-27 22:01:33,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3245960.0, ans=0.2 2023-11-27 22:01:36,034 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5950, loss[loss=0.06906, simple_loss=0.1, pruned_loss=0.01469, audio_tagging_loss=0.004362, over 15730.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09148, pruned_loss=0.0127, audio_tagging_loss=0.008545, over 3052806.82 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:01:51,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3246093.3333333335, ans=0.0 2023-11-27 22:01:54,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3246093.3333333335, ans=0.2 2023-11-27 22:02:11,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3246160.0, ans=0.125 2023-11-27 22:02:34,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-27 22:02:43,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.680e+01 9.187e+01 9.808e+01 1.354e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 22:02:43,498 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-27 22:02:53,404 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6000, loss[loss=0.05621, simple_loss=0.08693, pruned_loss=0.00421, audio_tagging_loss=0.008531, over 15684.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09128, pruned_loss=0.01264, audio_tagging_loss=0.00852, over 3044643.16 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:02:53,405 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 22:03:08,299 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.9999, 2.9111, 3.1772, 2.7071, 3.1417, 2.9255, 2.9459, 3.0105], device='cuda:3') 2023-11-27 22:03:35,231 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05724, simple_loss=0.05055, pruned_loss=0.005142, audio_tagging_loss=0.02682, over 4681554.00 frames. 2023-11-27 22:03:35,232 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 22:03:59,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3246426.6666666665, ans=0.125 2023-11-27 22:04:05,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3246493.3333333335, ans=0.2 2023-11-27 22:04:07,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3246493.3333333335, ans=0.0 2023-11-27 22:04:10,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3246493.3333333335, ans=0.0 2023-11-27 22:04:21,386 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:04:30,215 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:04:38,859 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-27 22:04:43,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3246626.6666666665, ans=0.0 2023-11-27 22:04:44,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3246626.6666666665, ans=0.0 2023-11-27 22:04:46,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-27 22:04:47,099 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6050, loss[loss=0.05655, simple_loss=0.07423, pruned_loss=0.01159, audio_tagging_loss=0.007847, over 15144.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09117, pruned_loss=0.01264, audio_tagging_loss=0.008565, over 3047369.73 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:04:51,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-27 22:04:55,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3246693.3333333335, ans=0.0 2023-11-27 22:05:06,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3246760.0, ans=0.125 2023-11-27 22:05:12,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3246760.0, ans=0.125 2023-11-27 22:05:19,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-27 22:05:23,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246826.6666666665, ans=0.1 2023-11-27 22:05:29,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3246893.3333333335, ans=0.04949747468305833 2023-11-27 22:05:30,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3246893.3333333335, ans=0.0 2023-11-27 22:05:30,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3246893.3333333335, ans=0.125 2023-11-27 22:05:47,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-27 22:05:48,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.706e+01 9.274e+01 9.905e+01 1.388e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 22:05:48,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3246960.0, ans=0.125 2023-11-27 22:05:56,285 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6100, loss[loss=0.07358, simple_loss=0.09815, pruned_loss=0.01811, audio_tagging_loss=0.00639, over 14845.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09076, pruned_loss=0.01255, audio_tagging_loss=0.008486, over 3040081.63 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:06:03,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3247026.6666666665, ans=0.2 2023-11-27 22:06:38,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-27 22:06:50,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3247293.3333333335, ans=0.125 2023-11-27 22:06:56,209 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-27 22:07:04,068 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6150, loss[loss=0.05866, simple_loss=0.07918, pruned_loss=0.01176, audio_tagging_loss=0.007307, over 15160.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09058, pruned_loss=0.01259, audio_tagging_loss=0.008571, over 3041824.52 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:07:05,723 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:07:07,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3247360.0, ans=0.125 2023-11-27 22:07:11,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3247360.0, ans=0.125 2023-11-27 22:07:15,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3247360.0, ans=0.125 2023-11-27 22:07:23,715 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:07:24,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3247426.6666666665, ans=0.0 2023-11-27 22:07:39,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3247493.3333333335, ans=0.125 2023-11-27 22:07:47,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3247560.0, ans=0.125 2023-11-27 22:08:03,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3247626.6666666665, ans=0.125 2023-11-27 22:08:04,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-27 22:08:05,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.962e+01 9.637e+01 1.023e+02 1.658e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 22:08:11,791 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6200, loss[loss=0.06952, simple_loss=0.09243, pruned_loss=0.01481, audio_tagging_loss=0.008496, over 15844.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09062, pruned_loss=0.01273, audio_tagging_loss=0.00873, over 3045290.54 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:08:33,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2023-11-27 22:08:38,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-11-27 22:08:50,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3247826.6666666665, ans=22.5 2023-11-27 22:08:55,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3247893.3333333335, ans=0.125 2023-11-27 22:09:01,611 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:09:09,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-27 22:09:17,720 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6250, loss[loss=0.05458, simple_loss=0.07909, pruned_loss=0.006342, audio_tagging_loss=0.00869, over 14995.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08875, pruned_loss=0.0125, audio_tagging_loss=0.008895, over 3044837.27 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:09:23,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2023-11-27 22:09:34,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3248093.3333333335, ans=10.0 2023-11-27 22:09:35,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3248093.3333333335, ans=0.125 2023-11-27 22:09:50,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3248160.0, ans=0.2 2023-11-27 22:09:55,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3248160.0, ans=0.07 2023-11-27 22:10:06,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-27 22:10:07,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-11-27 22:10:08,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3248293.3333333335, ans=0.125 2023-11-27 22:10:15,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-27 22:10:17,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.680e+01 9.045e+01 9.912e+01 1.334e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 22:10:23,261 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6300, loss[loss=0.03866, simple_loss=0.04513, pruned_loss=0.006047, audio_tagging_loss=0.01005, over 14015.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08879, pruned_loss=0.01248, audio_tagging_loss=0.008949, over 3048129.92 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:10:27,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3248360.0, ans=0.125 2023-11-27 22:11:04,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3248560.0, ans=0.125 2023-11-27 22:11:14,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2023-11-27 22:11:15,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3248626.6666666665, ans=0.125 2023-11-27 22:11:20,341 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-27 22:11:27,398 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6350, loss[loss=0.06268, simple_loss=0.08981, pruned_loss=0.01077, audio_tagging_loss=0.007011, over 14676.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08939, pruned_loss=0.01263, audio_tagging_loss=0.008965, over 3044938.18 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:11:38,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3248760.0, ans=0.125 2023-11-27 22:11:43,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3248760.0, ans=0.0 2023-11-27 22:11:45,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3248760.0, ans=0.1 2023-11-27 22:11:50,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-27 22:11:52,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3248826.6666666665, ans=0.2 2023-11-27 22:11:56,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3248826.6666666665, ans=0.2 2023-11-27 22:12:23,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-27 22:12:24,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.655e+01 9.162e+01 9.797e+01 1.327e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 22:12:31,048 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6400, loss[loss=0.06475, simple_loss=0.08599, pruned_loss=0.009421, audio_tagging_loss=0.01233, over 15198.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08945, pruned_loss=0.01253, audio_tagging_loss=0.009032, over 3048716.05 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:12:32,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3249026.6666666665, ans=15.0 2023-11-27 22:12:50,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-27 22:13:12,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3249226.6666666665, ans=0.2 2023-11-27 22:13:28,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-27 22:13:36,674 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6450, loss[loss=0.06293, simple_loss=0.09372, pruned_loss=0.009152, audio_tagging_loss=0.006919, over 15178.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08958, pruned_loss=0.01269, audio_tagging_loss=0.008987, over 3046149.10 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:14:00,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3249426.6666666665, ans=0.125 2023-11-27 22:14:16,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-27 22:14:34,782 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-27 22:14:37,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.688e+01 9.330e+01 9.887e+01 1.158e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 22:14:42,188 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6500, loss[loss=0.06096, simple_loss=0.0898, pruned_loss=0.009127, audio_tagging_loss=0.006938, over 16210.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08949, pruned_loss=0.01264, audio_tagging_loss=0.008929, over 3044222.45 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:15:19,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3249893.3333333335, ans=0.1 2023-11-27 22:15:38,352 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-27 22:15:41,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-11-27 22:15:45,971 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6550, loss[loss=0.07617, simple_loss=0.1073, pruned_loss=0.01575, audio_tagging_loss=0.006784, over 16218.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08929, pruned_loss=0.01252, audio_tagging_loss=0.008877, over 3046090.66 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:16:04,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.41 vs. limit=10.0 2023-11-27 22:16:32,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2023-11-27 22:16:42,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-27 22:16:45,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.718e+01 9.335e+01 9.836e+01 1.577e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 22:16:51,221 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6600, loss[loss=0.03576, simple_loss=0.04398, pruned_loss=0.003217, audio_tagging_loss=0.01055, over 13993.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08896, pruned_loss=0.01239, audio_tagging_loss=0.008825, over 3044801.98 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:17:16,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3250493.3333333335, ans=0.5 2023-11-27 22:17:36,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3250560.0, ans=0.125 2023-11-27 22:17:39,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3250560.0, ans=0.0 2023-11-27 22:17:42,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-11-27 22:17:47,811 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-27 22:17:49,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3250626.6666666665, ans=0.125 2023-11-27 22:17:56,533 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6650, loss[loss=0.06812, simple_loss=0.09651, pruned_loss=0.01205, audio_tagging_loss=0.007814, over 14857.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08884, pruned_loss=0.01239, audio_tagging_loss=0.00875, over 3048599.89 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:18:00,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3250693.3333333335, ans=0.125 2023-11-27 22:18:08,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3250760.0, ans=0.0 2023-11-27 22:18:30,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3250826.6666666665, ans=0.125 2023-11-27 22:18:49,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3250960.0, ans=0.1 2023-11-27 22:18:52,915 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-27 22:18:55,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.676e+01 9.213e+01 9.807e+01 1.195e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 22:19:00,103 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6700, loss[loss=0.0747, simple_loss=0.1051, pruned_loss=0.0143, audio_tagging_loss=0.007833, over 15819.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08918, pruned_loss=0.01231, audio_tagging_loss=0.008733, over 3042296.99 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:19:02,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3251026.6666666665, ans=0.125 2023-11-27 22:19:20,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3251093.3333333335, ans=0.125 2023-11-27 22:19:23,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3251093.3333333335, ans=0.1 2023-11-27 22:19:33,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3251160.0, ans=0.125 2023-11-27 22:19:41,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3251226.6666666665, ans=0.125 2023-11-27 22:19:56,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-27 22:19:59,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2023-11-27 22:20:04,221 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6750, loss[loss=0.08768, simple_loss=0.1231, pruned_loss=0.02136, audio_tagging_loss=0.004749, over 15201.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08896, pruned_loss=0.01232, audio_tagging_loss=0.008693, over 3043796.90 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:20:20,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3251426.6666666665, ans=0.04949747468305833 2023-11-27 22:20:32,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2023-11-27 22:20:35,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3251493.3333333335, ans=0.125 2023-11-27 22:20:59,908 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-27 22:21:02,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.663e+01 9.320e+01 9.783e+01 1.430e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 22:21:07,720 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6800, loss[loss=0.07127, simple_loss=0.09387, pruned_loss=0.0157, audio_tagging_loss=0.00864, over 15363.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08917, pruned_loss=0.01252, audio_tagging_loss=0.00881, over 3048548.81 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:21:07,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3251693.3333333335, ans=0.0 2023-11-27 22:21:10,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2023-11-27 22:21:11,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3251693.3333333335, ans=0.125 2023-11-27 22:21:50,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.27 vs. limit=6.0 2023-11-27 22:21:52,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3251893.3333333335, ans=0.125 2023-11-27 22:21:53,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-27 22:22:03,534 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-27 22:22:11,463 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6850, loss[loss=0.08569, simple_loss=0.1213, pruned_loss=0.0162, audio_tagging_loss=0.008848, over 15431.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08966, pruned_loss=0.01249, audio_tagging_loss=0.008784, over 3045280.65 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:22:23,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3252093.3333333335, ans=0.125 2023-11-27 22:22:49,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3252226.6666666665, ans=0.125 2023-11-27 22:23:00,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3252293.3333333335, ans=0.0 2023-11-27 22:23:05,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3252293.3333333335, ans=0.0 2023-11-27 22:23:06,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-27 22:23:09,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3252293.3333333335, ans=0.1 2023-11-27 22:23:10,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.804e+01 9.259e+01 1.010e+02 1.279e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 22:23:14,143 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6900, loss[loss=0.06791, simple_loss=0.1016, pruned_loss=0.008952, audio_tagging_loss=0.008174, over 17202.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08981, pruned_loss=0.01234, audio_tagging_loss=0.008752, over 3044332.64 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:23:16,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3252360.0, ans=0.0 2023-11-27 22:23:26,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3252426.6666666665, ans=0.125 2023-11-27 22:23:37,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3252493.3333333335, ans=0.1 2023-11-27 22:23:39,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3252493.3333333335, ans=0.0 2023-11-27 22:23:46,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3252493.3333333335, ans=0.2 2023-11-27 22:23:55,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-27 22:23:59,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3252560.0, ans=0.125 2023-11-27 22:24:04,079 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:24:07,455 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-27 22:24:14,288 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6950, loss[loss=0.06062, simple_loss=0.08151, pruned_loss=0.01199, audio_tagging_loss=0.007871, over 14889.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09052, pruned_loss=0.01254, audio_tagging_loss=0.008735, over 3043884.42 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:24:41,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3252826.6666666665, ans=0.1 2023-11-27 22:24:50,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-27 22:24:53,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3252893.3333333335, ans=0.0 2023-11-27 22:25:09,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3252960.0, ans=0.05 2023-11-27 22:25:11,126 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-27 22:25:17,137 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.695e+01 9.327e+01 1.020e+02 1.737e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 22:25:22,543 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7000, loss[loss=0.055, simple_loss=0.07785, pruned_loss=0.00888, audio_tagging_loss=0.0072, over 14439.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0899, pruned_loss=0.01246, audio_tagging_loss=0.008747, over 3043578.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:27:11,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3253160.0, ans=0.125 2023-11-27 22:27:29,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3253226.6666666665, ans=0.2 2023-11-27 22:28:06,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-27 22:28:41,375 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-27 22:29:17,473 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7050, loss[loss=0.09704, simple_loss=0.1468, pruned_loss=0.01894, audio_tagging_loss=0.00469, over 16570.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09025, pruned_loss=0.01254, audio_tagging_loss=0.008793, over 3050784.13 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:29:23,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3253360.0, ans=0.0 2023-11-27 22:29:58,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3253360.0, ans=0.04949747468305833 2023-11-27 22:30:18,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3253426.6666666665, ans=0.0 2023-11-27 22:30:38,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-11-27 22:31:19,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2023-11-27 22:32:14,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3253560.0, ans=0.2 2023-11-27 22:32:41,839 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-27 22:32:42,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2023-11-27 22:33:03,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.458e+01 9.244e+01 1.037e+02 2.754e+02, threshold=1.849e+02, percent-clipped=1.0 2023-11-27 22:33:16,712 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7100, loss[loss=0.0581, simple_loss=0.0756, pruned_loss=0.01012, audio_tagging_loss=0.01017, over 14512.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08947, pruned_loss=0.01233, audio_tagging_loss=0.008948, over 3047641.12 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:33:50,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-27 22:34:50,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3253760.0, ans=0.025 2023-11-27 22:35:53,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3253893.3333333335, ans=0.2 2023-11-27 22:36:47,390 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-27 22:37:13,741 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7150, loss[loss=0.09459, simple_loss=0.1335, pruned_loss=0.02256, audio_tagging_loss=0.005282, over 16100.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09051, pruned_loss=0.01251, audio_tagging_loss=0.008923, over 3044755.28 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:38:06,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3254093.3333333335, ans=0.04949747468305833 2023-11-27 22:38:18,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3254093.3333333335, ans=0.0 2023-11-27 22:39:07,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2023-11-27 22:39:20,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3254160.0, ans=0.0 2023-11-27 22:39:46,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3254226.6666666665, ans=0.2 2023-11-27 22:40:01,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3254226.6666666665, ans=0.1 2023-11-27 22:40:06,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3254226.6666666665, ans=0.1 2023-11-27 22:40:28,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-27 22:40:35,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3254293.3333333335, ans=0.0 2023-11-27 22:40:45,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.864e+01 9.452e+01 1.007e+02 1.551e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 22:41:01,823 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7200, loss[loss=0.08377, simple_loss=0.1173, pruned_loss=0.01695, audio_tagging_loss=0.008164, over 15515.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09036, pruned_loss=0.01256, audio_tagging_loss=0.008988, over 3045441.74 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:43:18,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3254560.0, ans=0.2 2023-11-27 22:43:46,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-27 22:44:10,279 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7250, loss[loss=0.07583, simple_loss=0.1037, pruned_loss=0.01371, audio_tagging_loss=0.01025, over 14778.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08976, pruned_loss=0.01237, audio_tagging_loss=0.009044, over 3042227.46 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:44:26,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3254693.3333333335, ans=0.0 2023-11-27 22:44:29,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254693.3333333335, ans=0.1 2023-11-27 22:44:44,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-11-27 22:44:47,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3254760.0, ans=0.125 2023-11-27 22:44:58,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-27 22:45:00,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3254760.0, ans=0.2 2023-11-27 22:45:31,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-27 22:46:11,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2023-11-27 22:46:26,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3254960.0, ans=0.1 2023-11-27 22:46:29,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-27 22:46:41,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.745e+01 9.249e+01 1.003e+02 1.162e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 22:46:46,957 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7300, loss[loss=0.1055, simple_loss=0.1396, pruned_loss=0.02455, audio_tagging_loss=0.01109, over 16240.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09039, pruned_loss=0.01257, audio_tagging_loss=0.008927, over 3045012.89 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:46:52,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3255026.6666666665, ans=0.125 2023-11-27 22:47:16,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-27 22:48:32,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3255226.6666666665, ans=0.125 2023-11-27 22:49:05,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-27 22:49:16,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3255293.3333333335, ans=0.125 2023-11-27 22:49:24,625 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7350, loss[loss=0.07199, simple_loss=0.09863, pruned_loss=0.01607, audio_tagging_loss=0.006601, over 14509.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08945, pruned_loss=0.01267, audio_tagging_loss=0.008849, over 3035729.99 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:49:29,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3255360.0, ans=0.2 2023-11-27 22:49:39,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3255360.0, ans=0.125 2023-11-27 22:49:42,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3255360.0, ans=0.0 2023-11-27 22:52:03,383 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-27 22:52:15,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.717e+01 9.286e+01 1.027e+02 1.219e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 22:52:18,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-27 22:52:20,678 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7400, loss[loss=0.07609, simple_loss=0.105, pruned_loss=0.01427, audio_tagging_loss=0.009339, over 16180.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08994, pruned_loss=0.01278, audio_tagging_loss=0.008741, over 3041077.71 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:52:27,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-27 22:52:37,131 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:52:58,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3255760.0, ans=0.125 2023-11-27 22:53:06,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3255760.0, ans=0.125 2023-11-27 22:53:44,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-27 22:54:59,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-27 22:55:21,616 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7450, loss[loss=0.0605, simple_loss=0.08259, pruned_loss=0.01261, audio_tagging_loss=0.00659, over 14860.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09167, pruned_loss=0.01312, audio_tagging_loss=0.008621, over 3042847.79 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:56:20,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3256093.3333333335, ans=0.0 2023-11-27 22:56:25,196 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:56:25,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3256160.0, ans=0.1 2023-11-27 22:57:13,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3256226.6666666665, ans=0.125 2023-11-27 22:57:18,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3256226.6666666665, ans=0.1 2023-11-27 22:57:48,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-27 22:57:59,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.653e+01 9.263e+01 9.964e+01 1.295e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 22:58:07,578 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7500, loss[loss=0.05252, simple_loss=0.07681, pruned_loss=0.007174, audio_tagging_loss=0.006943, over 14847.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09061, pruned_loss=0.013, audio_tagging_loss=0.008609, over 3041467.75 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:58:33,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3256360.0, ans=0.125 2023-11-27 22:59:03,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-27 23:00:42,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-27 23:00:46,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3256626.6666666665, ans=0.1 2023-11-27 23:00:53,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-27 23:01:02,549 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7550, loss[loss=0.08533, simple_loss=0.1211, pruned_loss=0.01711, audio_tagging_loss=0.007669, over 15226.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09112, pruned_loss=0.01304, audio_tagging_loss=0.00855, over 3042103.96 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:01:27,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:02:08,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3256760.0, ans=0.0 2023-11-27 23:02:14,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3256826.6666666665, ans=0.0 2023-11-27 23:02:20,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3256826.6666666665, ans=10.0 2023-11-27 23:03:01,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-27 23:03:30,460 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-27 23:03:43,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.710e+01 9.273e+01 1.023e+02 1.229e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 23:03:49,371 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7600, loss[loss=0.06403, simple_loss=0.0869, pruned_loss=0.01059, audio_tagging_loss=0.009997, over 14982.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09075, pruned_loss=0.01298, audio_tagging_loss=0.008513, over 3042388.91 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:04:48,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3257160.0, ans=0.0 2023-11-27 23:04:55,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=3257160.0, ans=12.0 2023-11-27 23:05:27,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3257226.6666666665, ans=0.1 2023-11-27 23:05:59,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3257293.3333333335, ans=0.2 2023-11-27 23:06:06,768 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-27 23:06:10,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3257293.3333333335, ans=0.125 2023-11-27 23:06:20,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3257293.3333333335, ans=0.2 2023-11-27 23:06:26,493 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7650, loss[loss=0.06591, simple_loss=0.0809, pruned_loss=0.01348, audio_tagging_loss=0.01198, over 15018.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09012, pruned_loss=0.01286, audio_tagging_loss=0.008547, over 3038102.76 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:06:36,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3257360.0, ans=0.0 2023-11-27 23:07:08,926 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:07:17,987 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:07:43,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3257493.3333333335, ans=0.1 2023-11-27 23:07:43,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-11-27 23:08:23,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3257626.6666666665, ans=0.1 2023-11-27 23:08:36,748 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-27 23:08:37,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-27 23:08:45,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3257626.6666666665, ans=0.125 2023-11-27 23:08:50,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.807e+01 9.447e+01 1.017e+02 1.729e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 23:08:53,432 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7700, loss[loss=0.06582, simple_loss=0.08747, pruned_loss=0.01154, audio_tagging_loss=0.01054, over 14734.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08965, pruned_loss=0.01269, audio_tagging_loss=0.00858, over 3038836.48 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:09:25,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3257760.0, ans=0.1 2023-11-27 23:09:57,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2023-11-27 23:10:02,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-27 23:10:28,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2023-11-27 23:10:55,000 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-27 23:11:00,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3257960.0, ans=0.125 2023-11-27 23:11:01,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-11-27 23:11:16,879 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7750, loss[loss=0.06287, simple_loss=0.0901, pruned_loss=0.008905, audio_tagging_loss=0.00891, over 16562.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08945, pruned_loss=0.01269, audio_tagging_loss=0.008579, over 3037820.54 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:11:27,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3258026.6666666665, ans=0.125 2023-11-27 23:12:14,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3258093.3333333335, ans=0.2 2023-11-27 23:13:02,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3258226.6666666665, ans=0.035 2023-11-27 23:13:31,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-11-27 23:13:42,147 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-27 23:13:55,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.828e+01 9.509e+01 1.004e+02 1.323e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 23:13:57,924 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7800, loss[loss=0.05322, simple_loss=0.06455, pruned_loss=0.01153, audio_tagging_loss=0.009412, over 14438.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08968, pruned_loss=0.01254, audio_tagging_loss=0.008578, over 3040790.53 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:14:29,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3258426.6666666665, ans=0.2 2023-11-27 23:14:41,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-27 23:14:42,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3258426.6666666665, ans=0.2 2023-11-27 23:15:12,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3258560.0, ans=0.0 2023-11-27 23:15:50,347 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-27 23:15:53,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3258626.6666666665, ans=0.09899494936611666 2023-11-27 23:16:03,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3258626.6666666665, ans=0.0 2023-11-27 23:16:08,332 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7850, loss[loss=0.07448, simple_loss=0.1037, pruned_loss=0.01273, audio_tagging_loss=0.009887, over 15228.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08948, pruned_loss=0.01244, audio_tagging_loss=0.008718, over 3035838.27 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:16:21,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3258693.3333333335, ans=0.2 2023-11-27 23:16:25,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3258693.3333333335, ans=10.0 2023-11-27 23:16:27,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3258693.3333333335, ans=0.125 2023-11-27 23:16:31,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3258760.0, ans=0.125 2023-11-27 23:16:54,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3258826.6666666665, ans=0.125 2023-11-27 23:17:06,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3258826.6666666665, ans=0.1 2023-11-27 23:17:34,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3258893.3333333335, ans=0.09899494936611666 2023-11-27 23:17:34,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=22.5 2023-11-27 23:17:41,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3258960.0, ans=0.125 2023-11-27 23:17:51,954 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-27 23:17:52,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3258960.0, ans=0.07 2023-11-27 23:17:56,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3258960.0, ans=0.0 2023-11-27 23:18:02,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.704e+01 9.225e+01 1.001e+02 1.986e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 23:18:06,092 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7900, loss[loss=0.07477, simple_loss=0.09823, pruned_loss=0.01721, audio_tagging_loss=0.008445, over 15984.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08946, pruned_loss=0.01257, audio_tagging_loss=0.008859, over 3040412.05 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:18:13,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2023-11-27 23:18:36,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3259093.3333333335, ans=0.1 2023-11-27 23:18:46,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3259093.3333333335, ans=0.2 2023-11-27 23:19:04,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=22.5 2023-11-27 23:19:21,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3259226.6666666665, ans=0.125 2023-11-27 23:19:28,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3259226.6666666665, ans=0.125 2023-11-27 23:19:46,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3259293.3333333335, ans=0.125 2023-11-27 23:19:48,330 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-27 23:20:01,486 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7950, loss[loss=0.04461, simple_loss=0.05394, pruned_loss=0.006502, audio_tagging_loss=0.01113, over 13909.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09072, pruned_loss=0.01269, audio_tagging_loss=0.008875, over 3039410.21 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:20:30,300 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:20:53,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3259493.3333333335, ans=0.125 2023-11-27 23:21:22,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3259626.6666666665, ans=0.125 2023-11-27 23:21:24,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-27 23:21:28,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-27 23:21:37,659 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.632e+01 9.434e+01 1.008e+02 1.251e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 23:21:39,758 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8000, loss[loss=0.07171, simple_loss=0.09148, pruned_loss=0.01431, audio_tagging_loss=0.01166, over 15830.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09073, pruned_loss=0.01262, audio_tagging_loss=0.008921, over 3039572.22 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:21:40,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2023-11-27 23:22:37,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-27 23:22:39,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3259893.3333333335, ans=0.0 2023-11-27 23:22:45,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-27 23:23:02,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3259960.0, ans=0.125 2023-11-27 23:23:04,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-27 23:23:06,044 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-27 23:23:14,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-27 23:23:17,654 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8050, loss[loss=0.06155, simple_loss=0.09005, pruned_loss=0.008367, audio_tagging_loss=0.008153, over 16461.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09031, pruned_loss=0.01243, audio_tagging_loss=0.008931, over 3040258.72 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:23:40,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3260093.3333333335, ans=0.0 2023-11-27 23:23:47,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3260093.3333333335, ans=0.0 2023-11-27 23:24:34,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-27 23:24:40,357 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-27 23:24:40,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-27 23:24:49,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.685e+01 9.405e+01 9.974e+01 1.162e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 23:24:50,871 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8100, loss[loss=0.03762, simple_loss=0.04386, pruned_loss=0.006173, audio_tagging_loss=0.009515, over 16361.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09082, pruned_loss=0.01261, audio_tagging_loss=0.008883, over 3040074.48 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:25:16,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-27 23:25:24,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-27 23:25:43,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3260493.3333333335, ans=0.0 2023-11-27 23:26:00,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3260560.0, ans=0.125 2023-11-27 23:26:08,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-27 23:26:13,574 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-27 23:26:13,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3260626.6666666665, ans=0.125 2023-11-27 23:26:24,164 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8150, loss[loss=0.0738, simple_loss=0.09967, pruned_loss=0.01593, audio_tagging_loss=0.008035, over 15289.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.0902, pruned_loss=0.01267, audio_tagging_loss=0.00875, over 3036649.99 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:26:39,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-11-27 23:27:16,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3260826.6666666665, ans=0.0 2023-11-27 23:27:42,896 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-27 23:27:50,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.874e+01 9.379e+01 1.019e+02 1.298e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:27:52,131 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8200, loss[loss=0.06675, simple_loss=0.0896, pruned_loss=0.01489, audio_tagging_loss=0.007062, over 14570.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09049, pruned_loss=0.01269, audio_tagging_loss=0.008558, over 3045846.74 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:27:54,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2023-11-27 23:27:54,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-11-27 23:27:56,740 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:28:32,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3261160.0, ans=0.0 2023-11-27 23:29:04,133 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-27 23:29:12,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3261360.0, ans=0.125 2023-11-27 23:29:13,158 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8250, loss[loss=0.05422, simple_loss=0.07235, pruned_loss=0.01008, audio_tagging_loss=0.007964, over 15669.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09049, pruned_loss=0.0127, audio_tagging_loss=0.008507, over 3045264.21 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:29:46,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3261493.3333333335, ans=0.0 2023-11-27 23:29:56,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3261560.0, ans=15.0 2023-11-27 23:30:18,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-27 23:30:27,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.905e+01 9.510e+01 1.029e+02 2.089e+02, threshold=1.902e+02, percent-clipped=1.0 2023-11-27 23:30:27,254 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8300, loss[loss=0.06664, simple_loss=0.08675, pruned_loss=0.0146, audio_tagging_loss=0.00866, over 15183.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09008, pruned_loss=0.01256, audio_tagging_loss=0.008604, over 3048844.77 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:31:08,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3261893.3333333335, ans=0.04949747468305833 2023-11-27 23:31:10,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3261893.3333333335, ans=0.2 2023-11-27 23:31:27,087 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-27 23:31:35,079 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8350, loss[loss=0.07213, simple_loss=0.0955, pruned_loss=0.01616, audio_tagging_loss=0.008226, over 17163.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09035, pruned_loss=0.01252, audio_tagging_loss=0.008557, over 3051500.54 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:32:04,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2023-11-27 23:32:09,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2023-11-27 23:32:33,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-27 23:32:46,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.783e+01 9.379e+01 1.006e+02 1.235e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:32:46,813 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8400, loss[loss=0.05763, simple_loss=0.0866, pruned_loss=0.007427, audio_tagging_loss=0.00691, over 14440.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08952, pruned_loss=0.01243, audio_tagging_loss=0.008575, over 3041890.86 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:34:02,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3262426.6666666665, ans=0.1 2023-11-27 23:34:03,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3262426.6666666665, ans=0.0 2023-11-27 23:35:02,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3262493.3333333335, ans=0.125 2023-11-27 23:35:18,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-27 23:36:09,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3262626.6666666665, ans=0.035 2023-11-27 23:36:10,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-27 23:36:28,151 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-27 23:36:58,984 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8450, loss[loss=0.08329, simple_loss=0.1179, pruned_loss=0.01639, audio_tagging_loss=0.007934, over 14500.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0892, pruned_loss=0.01253, audio_tagging_loss=0.00865, over 3044061.95 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:37:20,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-27 23:37:41,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3262760.0, ans=0.1 2023-11-27 23:37:41,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-27 23:37:45,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3262760.0, ans=0.1 2023-11-27 23:38:17,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3262760.0, ans=0.2 2023-11-27 23:39:28,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3262893.3333333335, ans=0.1 2023-11-27 23:39:32,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3262893.3333333335, ans=0.0 2023-11-27 23:39:32,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3262893.3333333335, ans=0.0 2023-11-27 23:39:37,050 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:39:41,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-27 23:40:19,359 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-27 23:40:53,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.777e+01 9.452e+01 1.015e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 23:40:53,266 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8500, loss[loss=0.06634, simple_loss=0.08429, pruned_loss=0.01284, audio_tagging_loss=0.01136, over 15691.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08942, pruned_loss=0.01249, audio_tagging_loss=0.008743, over 3047643.24 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:42:52,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3263160.0, ans=0.125 2023-11-27 23:44:11,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3263293.3333333335, ans=0.125 2023-11-27 23:44:18,189 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-27 23:44:42,870 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8550, loss[loss=0.05881, simple_loss=0.0833, pruned_loss=0.008791, audio_tagging_loss=0.00837, over 14444.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08908, pruned_loss=0.01237, audio_tagging_loss=0.008734, over 3052099.38 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:44:47,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3263360.0, ans=0.0 2023-11-27 23:44:50,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3263360.0, ans=0.125 2023-11-27 23:45:00,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3263360.0, ans=0.05 2023-11-27 23:45:03,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3263360.0, ans=0.04949747468305833 2023-11-27 23:45:34,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-27 23:45:34,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-27 23:45:52,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3263493.3333333335, ans=0.2 2023-11-27 23:46:19,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3263560.0, ans=0.0 2023-11-27 23:46:35,857 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-27 23:46:36,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-27 23:46:46,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263626.6666666665, ans=0.1 2023-11-27 23:46:46,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3263626.6666666665, ans=0.0 2023-11-27 23:46:50,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.577e+01 1.042e+02 1.217e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 23:46:50,818 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8600, loss[loss=0.07849, simple_loss=0.1082, pruned_loss=0.01604, audio_tagging_loss=0.008366, over 14977.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09015, pruned_loss=0.01262, audio_tagging_loss=0.008798, over 3049243.63 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:48:06,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3263893.3333333335, ans=0.0 2023-11-27 23:48:30,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-11-27 23:48:39,196 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-27 23:48:52,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3264026.6666666665, ans=0.5 2023-11-27 23:48:54,435 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8650, loss[loss=0.07004, simple_loss=0.09306, pruned_loss=0.01579, audio_tagging_loss=0.007724, over 14140.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09038, pruned_loss=0.01269, audio_tagging_loss=0.00878, over 3042142.01 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:48:57,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3264026.6666666665, ans=0.2 2023-11-27 23:49:37,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3264093.3333333335, ans=0.125 2023-11-27 23:49:37,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3264093.3333333335, ans=0.0 2023-11-27 23:50:05,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-27 23:50:21,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3264226.6666666665, ans=0.0 2023-11-27 23:50:24,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3264226.6666666665, ans=0.0 2023-11-27 23:50:38,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3264293.3333333335, ans=0.125 2023-11-27 23:50:44,260 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-27 23:50:58,418 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 8.922e+01 9.759e+01 1.039e+02 1.261e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 23:50:58,523 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8700, loss[loss=0.07369, simple_loss=0.1114, pruned_loss=0.009534, audio_tagging_loss=0.008482, over 15017.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09069, pruned_loss=0.01268, audio_tagging_loss=0.008804, over 3051956.57 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:51:13,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=12.0 2023-11-27 23:51:24,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3264426.6666666665, ans=0.125 2023-11-27 23:51:27,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-27 23:52:03,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3264493.3333333335, ans=0.0 2023-11-27 23:52:30,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-27 23:52:45,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3264626.6666666665, ans=0.125 2023-11-27 23:52:47,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-27 23:53:01,247 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8750, loss[loss=0.05824, simple_loss=0.07529, pruned_loss=0.01031, audio_tagging_loss=0.01028, over 14457.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09127, pruned_loss=0.01274, audio_tagging_loss=0.008896, over 3059716.83 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:54:14,288 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:54:51,332 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-27 23:54:58,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3264960.0, ans=0.0 2023-11-27 23:55:06,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.769e+01 9.393e+01 1.008e+02 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 23:55:06,050 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8800, loss[loss=0.06056, simple_loss=0.07988, pruned_loss=0.01081, audio_tagging_loss=0.009809, over 14253.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09157, pruned_loss=0.0127, audio_tagging_loss=0.008896, over 3054748.53 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:55:07,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-11-27 23:55:11,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3265026.6666666665, ans=0.125 2023-11-27 23:55:11,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3265026.6666666665, ans=0.07 2023-11-27 23:55:30,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3265093.3333333335, ans=0.0 2023-11-27 23:56:15,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3265160.0, ans=0.1 2023-11-27 23:56:24,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3265226.6666666665, ans=0.0 2023-11-27 23:56:29,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3265226.6666666665, ans=0.125 2023-11-27 23:56:52,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-27 23:56:52,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3265293.3333333335, ans=0.2 2023-11-27 23:57:07,589 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8850, loss[loss=0.06229, simple_loss=0.0879, pruned_loss=0.009836, audio_tagging_loss=0.008502, over 15310.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09175, pruned_loss=0.01257, audio_tagging_loss=0.008926, over 3056971.90 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:57:35,200 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:57:43,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3265426.6666666665, ans=0.2 2023-11-27 23:58:19,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3265560.0, ans=0.125 2023-11-27 23:58:45,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3265626.6666666665, ans=0.0 2023-11-27 23:58:50,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-27 23:59:00,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-27 23:59:03,408 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8900, loss[loss=0.05137, simple_loss=0.06033, pruned_loss=0.01139, audio_tagging_loss=0.009816, over 14485.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09133, pruned_loss=0.01244, audio_tagging_loss=0.008831, over 3054837.09 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:59:03,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3265693.3333333335, ans=0.125 2023-11-27 23:59:05,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.603e+01 9.158e+01 9.792e+01 1.158e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 23:59:09,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3265693.3333333335, ans=0.0 2023-11-27 23:59:11,952 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:59:22,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3265693.3333333335, ans=0.125 2023-11-27 23:59:50,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3265760.0, ans=0.125 2023-11-28 00:00:46,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=22.5 2023-11-28 00:00:56,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-28 00:01:01,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3265960.0, ans=0.125 2023-11-28 00:01:05,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3265960.0, ans=0.125 2023-11-28 00:01:10,643 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8950, loss[loss=0.08469, simple_loss=0.1116, pruned_loss=0.01821, audio_tagging_loss=0.01066, over 16185.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09173, pruned_loss=0.01269, audio_tagging_loss=0.008642, over 3059869.72 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:01:26,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3266026.6666666665, ans=0.0 2023-11-28 00:02:56,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-28 00:03:08,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3266360.0, ans=0.0 2023-11-28 00:03:10,754 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9000, loss[loss=0.05878, simple_loss=0.09044, pruned_loss=0.006791, audio_tagging_loss=0.006766, over 15376.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09149, pruned_loss=0.0127, audio_tagging_loss=0.00856, over 3058346.92 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:03:10,755 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 00:03:42,444 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0651, 4.9131, 3.7642, 4.2886], device='cuda:3') 2023-11-28 00:04:14,793 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05061, pruned_loss=0.005195, audio_tagging_loss=0.02785, over 4681554.00 frames. 2023-11-28 00:04:14,794 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 00:04:16,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.840e+01 9.454e+01 9.905e+01 1.337e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 00:04:17,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3266360.0, ans=0.1 2023-11-28 00:04:40,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2023-11-28 00:06:02,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-28 00:06:07,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3266626.6666666665, ans=0.0 2023-11-28 00:06:17,838 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9050, loss[loss=0.05728, simple_loss=0.08175, pruned_loss=0.009544, audio_tagging_loss=0.006865, over 15379.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09102, pruned_loss=0.01252, audio_tagging_loss=0.008621, over 3056131.65 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:07:04,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3266826.6666666665, ans=0.5 2023-11-28 00:07:39,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=15.0 2023-11-28 00:08:00,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3266960.0, ans=0.1 2023-11-28 00:08:04,806 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-28 00:08:19,690 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9100, loss[loss=0.06071, simple_loss=0.07898, pruned_loss=0.01031, audio_tagging_loss=0.01091, over 15589.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09062, pruned_loss=0.0125, audio_tagging_loss=0.00858, over 3060253.97 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:08:20,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3267026.6666666665, ans=0.0 2023-11-28 00:08:22,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.819e+01 9.395e+01 1.013e+02 1.222e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 00:09:31,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-11-28 00:10:03,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2023-11-28 00:10:04,174 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-28 00:10:17,644 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9150, loss[loss=0.06498, simple_loss=0.08938, pruned_loss=0.01262, audio_tagging_loss=0.007677, over 14948.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09079, pruned_loss=0.01254, audio_tagging_loss=0.008537, over 3058861.33 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:10:33,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3267360.0, ans=0.05 2023-11-28 00:10:42,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3267426.6666666665, ans=0.125 2023-11-28 00:11:16,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-28 00:11:18,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-28 00:11:23,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3267560.0, ans=0.2 2023-11-28 00:11:27,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3267560.0, ans=0.0 2023-11-28 00:11:37,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3267560.0, ans=0.125 2023-11-28 00:11:42,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-28 00:11:48,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3267626.6666666665, ans=0.125 2023-11-28 00:11:56,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-28 00:11:57,869 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-28 00:12:08,765 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9200, loss[loss=0.05925, simple_loss=0.07536, pruned_loss=0.009551, audio_tagging_loss=0.01202, over 14360.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0907, pruned_loss=0.01251, audio_tagging_loss=0.008517, over 3059385.92 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:12:11,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.944e+01 9.391e+01 1.026e+02 1.333e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 00:12:17,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-28 00:13:08,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2023-11-28 00:13:20,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-28 00:13:56,204 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-28 00:13:56,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-11-28 00:14:13,501 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9250, loss[loss=0.063, simple_loss=0.09854, pruned_loss=0.008552, audio_tagging_loss=0.005177, over 14875.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09054, pruned_loss=0.01252, audio_tagging_loss=0.008437, over 3062606.29 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:14:33,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3268026.6666666665, ans=0.125 2023-11-28 00:14:39,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3268093.3333333335, ans=0.1 2023-11-28 00:14:51,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3268093.3333333335, ans=0.0 2023-11-28 00:15:29,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-28 00:15:37,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3268226.6666666665, ans=0.1 2023-11-28 00:15:44,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3268226.6666666665, ans=0.0 2023-11-28 00:15:57,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3268293.3333333335, ans=0.2 2023-11-28 00:16:09,117 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-28 00:16:23,620 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9300, loss[loss=0.06597, simple_loss=0.08713, pruned_loss=0.01367, audio_tagging_loss=0.008738, over 14284.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09046, pruned_loss=0.01246, audio_tagging_loss=0.00854, over 3060728.23 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:16:27,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.477e+01 9.136e+01 9.623e+01 1.227e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-28 00:16:29,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3268360.0, ans=0.125 2023-11-28 00:16:47,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3268426.6666666665, ans=0.0 2023-11-28 00:16:57,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3268426.6666666665, ans=0.125 2023-11-28 00:17:41,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3268560.0, ans=0.07 2023-11-28 00:18:01,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3268626.6666666665, ans=0.125 2023-11-28 00:18:01,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3268626.6666666665, ans=0.125 2023-11-28 00:18:10,840 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-28 00:18:17,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2023-11-28 00:18:23,604 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9350, loss[loss=0.05243, simple_loss=0.07017, pruned_loss=0.008147, audio_tagging_loss=0.009196, over 15586.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09089, pruned_loss=0.01248, audio_tagging_loss=0.008609, over 3057816.97 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:18:30,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3268693.3333333335, ans=0.0 2023-11-28 00:19:05,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-28 00:19:05,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-28 00:19:32,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3268893.3333333335, ans=0.1 2023-11-28 00:19:34,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3268893.3333333335, ans=0.0 2023-11-28 00:20:01,450 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-28 00:20:14,297 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9400, loss[loss=0.0516, simple_loss=0.07384, pruned_loss=0.006952, audio_tagging_loss=0.007728, over 15189.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09095, pruned_loss=0.01256, audio_tagging_loss=0.008658, over 3052198.96 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:20:18,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.645e+01 9.230e+01 9.959e+01 1.190e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-28 00:20:45,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3269093.3333333335, ans=0.2 2023-11-28 00:20:51,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3269093.3333333335, ans=0.1 2023-11-28 00:21:06,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3269160.0, ans=0.1 2023-11-28 00:21:08,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3269160.0, ans=0.125 2023-11-28 00:21:36,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3269226.6666666665, ans=0.125 2023-11-28 00:21:44,976 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:21:53,624 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-28 00:22:05,779 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9450, loss[loss=0.04992, simple_loss=0.06435, pruned_loss=0.009133, audio_tagging_loss=0.008615, over 15710.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09097, pruned_loss=0.0125, audio_tagging_loss=0.008761, over 3041335.16 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:22:05,879 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:22:21,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-28 00:23:32,621 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:23:45,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-28 00:23:58,252 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9500, loss[loss=0.06125, simple_loss=0.07669, pruned_loss=0.01189, audio_tagging_loss=0.01102, over 14208.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09078, pruned_loss=0.01259, audio_tagging_loss=0.008824, over 3043126.51 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:24:04,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.586e+01 9.559e+01 1.044e+02 1.238e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 00:24:12,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3269693.3333333335, ans=15.0 2023-11-28 00:24:13,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3269693.3333333335, ans=0.0 2023-11-28 00:24:37,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-28 00:25:02,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3269893.3333333335, ans=0.125 2023-11-28 00:25:25,050 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-28 00:25:32,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3269960.0, ans=0.125 2023-11-28 00:25:35,575 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9550, loss[loss=0.07163, simple_loss=0.09728, pruned_loss=0.01362, audio_tagging_loss=0.00937, over 15004.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09098, pruned_loss=0.01264, audio_tagging_loss=0.008897, over 3046269.01 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:26:02,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3270093.3333333335, ans=0.0 2023-11-28 00:26:34,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3270226.6666666665, ans=0.1 2023-11-28 00:26:35,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3270226.6666666665, ans=0.125 2023-11-28 00:26:41,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3270293.3333333335, ans=0.125 2023-11-28 00:26:49,690 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-28 00:26:55,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-28 00:26:58,236 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9600, loss[loss=0.07005, simple_loss=0.09518, pruned_loss=0.01149, audio_tagging_loss=0.01097, over 16196.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09134, pruned_loss=0.01265, audio_tagging_loss=0.008865, over 3053235.19 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:27:02,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.793e+01 9.266e+01 1.006e+02 1.228e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 00:27:45,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3270560.0, ans=0.125 2023-11-28 00:27:59,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-28 00:28:01,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2023-11-28 00:28:04,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3270626.6666666665, ans=0.125 2023-11-28 00:28:08,095 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9650, loss[loss=0.07644, simple_loss=0.1018, pruned_loss=0.01753, audio_tagging_loss=0.007989, over 15619.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09181, pruned_loss=0.01282, audio_tagging_loss=0.008875, over 3050296.63 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:28:32,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-28 00:28:47,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=22.5 2023-11-28 00:28:53,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3270893.3333333335, ans=0.1 2023-11-28 00:28:53,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3270893.3333333335, ans=0.125 2023-11-28 00:29:05,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-28 00:29:14,533 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9700, loss[loss=0.06766, simple_loss=0.08785, pruned_loss=0.01471, audio_tagging_loss=0.009027, over 15098.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09167, pruned_loss=0.0129, audio_tagging_loss=0.008758, over 3045617.55 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:29:14,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3271026.6666666665, ans=0.0 2023-11-28 00:29:18,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.733e+01 9.513e+01 1.030e+02 1.343e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 00:29:45,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3271160.0, ans=0.125 2023-11-28 00:30:00,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3271226.6666666665, ans=0.2 2023-11-28 00:30:08,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3271293.3333333335, ans=0.125 2023-11-28 00:30:10,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-28 00:30:18,803 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9750, loss[loss=0.08129, simple_loss=0.1125, pruned_loss=0.01593, audio_tagging_loss=0.009107, over 14575.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.0914, pruned_loss=0.01273, audio_tagging_loss=0.00869, over 3049735.49 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:30:28,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3271360.0, ans=0.0 2023-11-28 00:30:29,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3271426.6666666665, ans=0.125 2023-11-28 00:30:36,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3271426.6666666665, ans=0.125 2023-11-28 00:30:44,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3271493.3333333335, ans=0.2 2023-11-28 00:30:50,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3271493.3333333335, ans=0.0 2023-11-28 00:31:07,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3271626.6666666665, ans=0.0 2023-11-28 00:31:13,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-28 00:31:20,498 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9800, loss[loss=0.07105, simple_loss=0.09616, pruned_loss=0.0135, audio_tagging_loss=0.009469, over 15129.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09127, pruned_loss=0.01266, audio_tagging_loss=0.008686, over 3041380.81 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:31:23,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.662e+01 9.364e+01 1.024e+02 1.595e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:31:30,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3271693.3333333335, ans=0.125 2023-11-28 00:31:37,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3271760.0, ans=0.125 2023-11-28 00:31:45,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3271826.6666666665, ans=0.0 2023-11-28 00:32:06,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3271893.3333333335, ans=0.125 2023-11-28 00:32:13,080 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-28 00:32:15,754 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:32:20,870 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9850, loss[loss=0.07244, simple_loss=0.09862, pruned_loss=0.01526, audio_tagging_loss=0.007872, over 15055.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09214, pruned_loss=0.01282, audio_tagging_loss=0.008576, over 3045604.21 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:32:21,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3272026.6666666665, ans=0.125 2023-11-28 00:32:24,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3272026.6666666665, ans=0.05 2023-11-28 00:32:28,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3272026.6666666665, ans=0.0 2023-11-28 00:32:43,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-28 00:32:46,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3272160.0, ans=0.0 2023-11-28 00:32:56,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3272226.6666666665, ans=0.125 2023-11-28 00:32:59,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3272226.6666666665, ans=0.0 2023-11-28 00:33:12,908 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-28 00:33:20,804 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9900, loss[loss=0.05764, simple_loss=0.06922, pruned_loss=0.01074, audio_tagging_loss=0.0123, over 14112.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09127, pruned_loss=0.01277, audio_tagging_loss=0.008625, over 3044886.39 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:33:22,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3272360.0, ans=0.0 2023-11-28 00:33:24,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.033e+01 9.485e+01 1.050e+02 1.243e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 00:33:37,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-28 00:33:51,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3272493.3333333335, ans=0.0 2023-11-28 00:34:05,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=22.5 2023-11-28 00:34:08,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3272626.6666666665, ans=0.125 2023-11-28 00:34:11,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-28 00:34:18,415 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9950, loss[loss=0.08138, simple_loss=0.1152, pruned_loss=0.01636, audio_tagging_loss=0.007414, over 14376.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09158, pruned_loss=0.01299, audio_tagging_loss=0.008598, over 3050169.23 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:34:18,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3272693.3333333335, ans=0.0 2023-11-28 00:34:44,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272826.6666666665, ans=0.1 2023-11-28 00:34:46,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3272826.6666666665, ans=0.125 2023-11-28 00:34:49,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-28 00:35:04,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3272960.0, ans=10.0 2023-11-28 00:35:05,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=22.5 2023-11-28 00:35:09,066 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-28 00:35:10,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2023-11-28 00:35:16,006 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10000, loss[loss=0.07882, simple_loss=0.1053, pruned_loss=0.01727, audio_tagging_loss=0.008911, over 15023.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09168, pruned_loss=0.01277, audio_tagging_loss=0.00844, over 3049509.39 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:35:19,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.605e+01 9.101e+01 9.831e+01 1.246e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-28 00:35:37,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3273093.3333333335, ans=0.125 2023-11-28 00:35:39,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3273160.0, ans=0.0 2023-11-28 00:35:48,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-28 00:35:49,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3273226.6666666665, ans=0.1 2023-11-28 00:36:00,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3273293.3333333335, ans=0.0 2023-11-28 00:36:06,563 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-28 00:36:13,244 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10050, loss[loss=0.059, simple_loss=0.07944, pruned_loss=0.01018, audio_tagging_loss=0.009094, over 15851.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.0921, pruned_loss=0.01277, audio_tagging_loss=0.008443, over 3057770.48 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:36:23,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-28 00:36:33,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-11-28 00:36:40,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3273493.3333333335, ans=0.0 2023-11-28 00:36:55,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3273560.0, ans=0.125 2023-11-28 00:37:01,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3273626.6666666665, ans=0.2 2023-11-28 00:37:04,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3273626.6666666665, ans=0.125 2023-11-28 00:37:05,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-28 00:37:11,859 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10100, loss[loss=0.06922, simple_loss=0.09233, pruned_loss=0.01279, audio_tagging_loss=0.01027, over 14720.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09136, pruned_loss=0.01265, audio_tagging_loss=0.008614, over 3048283.89 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:37:12,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-11-28 00:37:17,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.687e+01 9.300e+01 1.008e+02 1.276e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:37:17,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3273693.3333333335, ans=0.125 2023-11-28 00:37:44,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-28 00:37:50,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3273893.3333333335, ans=0.0 2023-11-28 00:37:51,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3273893.3333333335, ans=0.2 2023-11-28 00:37:55,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3273893.3333333335, ans=0.125 2023-11-28 00:38:01,027 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:02,172 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-28 00:38:06,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3273960.0, ans=0.07 2023-11-28 00:38:09,087 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10150, loss[loss=0.06276, simple_loss=0.08017, pruned_loss=0.01469, audio_tagging_loss=0.007977, over 15672.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.0913, pruned_loss=0.01276, audio_tagging_loss=0.008739, over 3045462.26 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:38:20,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3274093.3333333335, ans=0.1 2023-11-28 00:38:39,090 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:45,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-28 00:38:48,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-28 00:38:59,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-28 00:39:05,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-28 00:39:06,391 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10200, loss[loss=0.06199, simple_loss=0.08381, pruned_loss=0.01365, audio_tagging_loss=0.006434, over 14688.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09096, pruned_loss=0.01271, audio_tagging_loss=0.008775, over 3042853.97 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:39:12,515 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.857e+01 9.633e+01 1.053e+02 1.293e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 00:39:25,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3274426.6666666665, ans=0.125 2023-11-28 00:39:30,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3274493.3333333335, ans=0.0 2023-11-28 00:39:31,016 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:39:51,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3274626.6666666665, ans=0.2 2023-11-28 00:39:57,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-28 00:40:00,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3274626.6666666665, ans=0.1 2023-11-28 00:40:05,362 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10250, loss[loss=0.05885, simple_loss=0.07591, pruned_loss=0.01035, audio_tagging_loss=0.01055, over 15078.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.0902, pruned_loss=0.01255, audio_tagging_loss=0.008813, over 3047346.48 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:40:12,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274693.3333333335, ans=0.1 2023-11-28 00:40:23,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5 2023-11-28 00:40:28,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3274826.6666666665, ans=0.0 2023-11-28 00:40:55,935 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-28 00:40:57,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.29 vs. limit=10.0 2023-11-28 00:41:02,317 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10300, loss[loss=0.06175, simple_loss=0.0798, pruned_loss=0.0122, audio_tagging_loss=0.009648, over 15295.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0905, pruned_loss=0.01272, audio_tagging_loss=0.008871, over 3051711.02 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:41:08,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.570e+01 8.818e+01 9.627e+01 1.031e+02 1.268e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 00:41:13,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-28 00:41:14,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-28 00:41:21,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-28 00:41:21,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3275093.3333333335, ans=0.2 2023-11-28 00:41:23,762 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:41:31,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3275160.0, ans=0.0 2023-11-28 00:41:32,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-28 00:41:53,415 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-28 00:41:54,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3275293.3333333335, ans=0.0 2023-11-28 00:41:59,903 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10350, loss[loss=0.08793, simple_loss=0.1242, pruned_loss=0.02016, audio_tagging_loss=0.005642, over 15877.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09086, pruned_loss=0.01284, audio_tagging_loss=0.008856, over 3054400.91 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:42:16,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3275426.6666666665, ans=0.125 2023-11-28 00:42:19,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3275426.6666666665, ans=0.0 2023-11-28 00:42:19,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3275426.6666666665, ans=0.125 2023-11-28 00:42:26,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3275493.3333333335, ans=0.125 2023-11-28 00:42:33,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3275560.0, ans=0.125 2023-11-28 00:42:50,266 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-28 00:42:56,800 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10400, loss[loss=0.05908, simple_loss=0.0828, pruned_loss=0.009299, audio_tagging_loss=0.008379, over 15052.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09051, pruned_loss=0.01264, audio_tagging_loss=0.00898, over 3056921.48 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:43:01,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.68 vs. limit=10.0 2023-11-28 00:43:02,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.640e+01 9.257e+01 1.001e+02 1.271e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 00:43:15,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3275760.0, ans=0.0 2023-11-28 00:43:17,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3275760.0, ans=0.0 2023-11-28 00:43:19,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3275826.6666666665, ans=0.0 2023-11-28 00:43:22,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3275826.6666666665, ans=0.1 2023-11-28 00:43:29,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2023-11-28 00:43:40,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3275893.3333333335, ans=0.125 2023-11-28 00:43:46,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-28 00:43:48,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3275960.0, ans=0.0 2023-11-28 00:43:54,164 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10450, loss[loss=0.06634, simple_loss=0.09091, pruned_loss=0.01305, audio_tagging_loss=0.00783, over 16082.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09094, pruned_loss=0.01257, audio_tagging_loss=0.008845, over 3056199.07 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:08,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3276093.3333333335, ans=0.125 2023-11-28 00:44:17,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276160.0, ans=0.1 2023-11-28 00:44:21,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3276160.0, ans=0.0 2023-11-28 00:44:25,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3276160.0, ans=10.0 2023-11-28 00:44:26,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-28 00:44:32,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276226.6666666665, ans=0.1 2023-11-28 00:44:40,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2023-11-28 00:44:44,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-28 00:44:47,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2023-11-28 00:44:51,493 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10500, loss[loss=0.0728, simple_loss=0.1045, pruned_loss=0.01269, audio_tagging_loss=0.007862, over 15386.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09069, pruned_loss=0.01262, audio_tagging_loss=0.008767, over 3055585.56 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:56,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.695e+01 9.363e+01 1.021e+02 1.243e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:45:20,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-28 00:45:23,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-28 00:45:33,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3276560.0, ans=0.125 2023-11-28 00:45:41,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-28 00:45:48,534 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10550, loss[loss=0.05456, simple_loss=0.07922, pruned_loss=0.008862, audio_tagging_loss=0.006089, over 15511.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09006, pruned_loss=0.0124, audio_tagging_loss=0.00871, over 3047083.08 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:46:03,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2023-11-28 00:46:05,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3276760.0, ans=0.0 2023-11-28 00:46:07,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3276760.0, ans=22.5 2023-11-28 00:46:17,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3276826.6666666665, ans=0.0 2023-11-28 00:46:20,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3276826.6666666665, ans=0.125 2023-11-28 00:46:39,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-28 00:46:39,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3276960.0, ans=0.2 2023-11-28 00:46:45,607 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10600, loss[loss=0.06747, simple_loss=0.08763, pruned_loss=0.01296, audio_tagging_loss=0.01069, over 15328.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08959, pruned_loss=0.01235, audio_tagging_loss=0.008706, over 3050084.42 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:46:51,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.682e+01 9.138e+01 9.881e+01 1.216e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 00:47:14,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3277160.0, ans=0.5 2023-11-28 00:47:36,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-28 00:47:36,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3277293.3333333335, ans=0.2 2023-11-28 00:47:41,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-28 00:47:41,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3277293.3333333335, ans=0.025 2023-11-28 00:47:44,080 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10650, loss[loss=0.06229, simple_loss=0.07668, pruned_loss=0.0125, audio_tagging_loss=0.01145, over 15293.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08944, pruned_loss=0.01236, audio_tagging_loss=0.008687, over 3052756.23 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:47:51,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3277360.0, ans=0.125 2023-11-28 00:48:08,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3277493.3333333335, ans=0.2 2023-11-28 00:48:12,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3277493.3333333335, ans=0.2 2023-11-28 00:48:19,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-28 00:48:26,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3277560.0, ans=0.125 2023-11-28 00:48:34,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-28 00:48:34,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-11-28 00:48:35,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3277626.6666666665, ans=0.025 2023-11-28 00:48:36,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3277626.6666666665, ans=0.0 2023-11-28 00:48:41,245 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10700, loss[loss=0.06976, simple_loss=0.09485, pruned_loss=0.01271, audio_tagging_loss=0.009626, over 15287.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08926, pruned_loss=0.01233, audio_tagging_loss=0.008695, over 3049507.78 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:48:42,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3277693.3333333335, ans=0.0 2023-11-28 00:48:46,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.300e+01 9.841e+01 1.574e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:48:50,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3277693.3333333335, ans=0.0 2023-11-28 00:49:12,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3277826.6666666665, ans=0.2 2023-11-28 00:49:16,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3277893.3333333335, ans=0.125 2023-11-28 00:49:26,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3277960.0, ans=0.1 2023-11-28 00:49:30,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-28 00:49:32,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3277960.0, ans=0.0 2023-11-28 00:49:37,222 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10750, loss[loss=0.06182, simple_loss=0.09001, pruned_loss=0.009322, audio_tagging_loss=0.007493, over 15430.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08944, pruned_loss=0.0123, audio_tagging_loss=0.008617, over 3054932.57 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:49:40,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3278026.6666666665, ans=0.05 2023-11-28 00:49:50,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2023-11-28 00:50:00,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3278160.0, ans=0.07 2023-11-28 00:50:06,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3278160.0, ans=0.0 2023-11-28 00:50:23,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-28 00:50:26,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-28 00:50:27,849 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-28 00:50:35,820 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10800, loss[loss=0.05234, simple_loss=0.07396, pruned_loss=0.007801, audio_tagging_loss=0.007562, over 16045.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08985, pruned_loss=0.01215, audio_tagging_loss=0.008609, over 3060084.56 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:50:38,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3278360.0, ans=0.125 2023-11-28 00:50:41,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.647e+01 9.300e+01 1.005e+02 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:51:00,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3278493.3333333335, ans=0.125 2023-11-28 00:51:12,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3278560.0, ans=0.125 2023-11-28 00:51:15,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3278560.0, ans=0.05 2023-11-28 00:51:18,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3278560.0, ans=0.07 2023-11-28 00:51:22,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-28 00:51:26,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-28 00:51:33,555 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10850, loss[loss=0.06384, simple_loss=0.08979, pruned_loss=0.01187, audio_tagging_loss=0.007065, over 14379.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08894, pruned_loss=0.01207, audio_tagging_loss=0.008595, over 3061292.54 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:51:38,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-28 00:51:44,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=12.0 2023-11-28 00:51:49,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3278760.0, ans=0.125 2023-11-28 00:51:52,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3278760.0, ans=0.125 2023-11-28 00:51:59,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3278826.6666666665, ans=0.125 2023-11-28 00:52:12,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3278893.3333333335, ans=0.1 2023-11-28 00:52:16,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3278893.3333333335, ans=0.125 2023-11-28 00:52:22,954 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-28 00:52:25,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-28 00:52:28,250 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:52:29,389 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10900, loss[loss=0.06366, simple_loss=0.08967, pruned_loss=0.01068, audio_tagging_loss=0.00814, over 16239.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08915, pruned_loss=0.01219, audio_tagging_loss=0.008652, over 3058080.96 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:52:32,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3279026.6666666665, ans=0.125 2023-11-28 00:52:34,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.904e+01 9.696e+01 1.053e+02 1.235e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 00:52:42,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3279093.3333333335, ans=0.125 2023-11-28 00:52:45,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3279093.3333333335, ans=0.0 2023-11-28 00:53:19,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-28 00:53:26,250 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10950, loss[loss=0.08381, simple_loss=0.1187, pruned_loss=0.01795, audio_tagging_loss=0.006487, over 15564.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08994, pruned_loss=0.01233, audio_tagging_loss=0.008644, over 3052254.75 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:53:29,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3279360.0, ans=0.0 2023-11-28 00:53:32,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3279360.0, ans=0.05 2023-11-28 00:53:40,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3279426.6666666665, ans=0.2 2023-11-28 00:53:48,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3279493.3333333335, ans=0.0 2023-11-28 00:54:09,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3279560.0, ans=0.125 2023-11-28 00:54:16,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=22.5 2023-11-28 00:54:17,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-28 00:54:24,145 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11000, loss[loss=0.07639, simple_loss=0.1008, pruned_loss=0.01765, audio_tagging_loss=0.00833, over 14426.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08958, pruned_loss=0.01225, audio_tagging_loss=0.008677, over 3048720.99 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:54:30,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.785e+01 9.323e+01 1.002e+02 1.243e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 00:54:35,482 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:54:57,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3279893.3333333335, ans=0.0 2023-11-28 00:55:14,287 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-28 00:55:22,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-28 00:55:22,873 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11050, loss[loss=0.05826, simple_loss=0.07194, pruned_loss=0.01015, audio_tagging_loss=0.01215, over 14988.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08977, pruned_loss=0.01241, audio_tagging_loss=0.008783, over 3047963.44 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:55:24,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3280026.6666666665, ans=0.025 2023-11-28 00:55:31,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-28 00:55:31,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3280026.6666666665, ans=0.125 2023-11-28 00:55:55,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3280160.0, ans=0.125 2023-11-28 00:56:12,653 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-28 00:56:19,110 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11100, loss[loss=0.07377, simple_loss=0.09192, pruned_loss=0.01494, audio_tagging_loss=0.01287, over 14579.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08984, pruned_loss=0.0126, audio_tagging_loss=0.00898, over 3052213.74 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:56:22,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3280360.0, ans=0.125 2023-11-28 00:56:26,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.769e+01 9.313e+01 9.922e+01 1.261e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 00:56:29,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3280426.6666666665, ans=0.0 2023-11-28 00:56:41,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3280493.3333333335, ans=0.125 2023-11-28 00:56:46,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3280493.3333333335, ans=0.07 2023-11-28 00:56:47,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280493.3333333335, ans=0.1 2023-11-28 00:56:48,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-11-28 00:57:01,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3280560.0, ans=0.2 2023-11-28 00:57:02,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3280560.0, ans=0.2 2023-11-28 00:57:09,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-28 00:57:13,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280626.6666666665, ans=0.1 2023-11-28 00:57:16,938 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11150, loss[loss=0.05736, simple_loss=0.07716, pruned_loss=0.007444, audio_tagging_loss=0.01134, over 16633.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08915, pruned_loss=0.01244, audio_tagging_loss=0.009116, over 3049231.97 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:57:20,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3280693.3333333335, ans=0.0 2023-11-28 00:57:41,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3280826.6666666665, ans=0.1 2023-11-28 00:57:54,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3280893.3333333335, ans=0.125 2023-11-28 00:57:54,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3280893.3333333335, ans=0.125 2023-11-28 00:57:57,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3280893.3333333335, ans=0.0 2023-11-28 00:57:58,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3280893.3333333335, ans=0.0 2023-11-28 00:58:07,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-28 00:58:11,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3280960.0, ans=0.0 2023-11-28 00:58:13,838 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11200, loss[loss=0.07437, simple_loss=0.09462, pruned_loss=0.0166, audio_tagging_loss=0.01045, over 14423.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0892, pruned_loss=0.01252, audio_tagging_loss=0.009227, over 3050581.98 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:58:14,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3281026.6666666665, ans=0.95 2023-11-28 00:58:20,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-28 00:58:20,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.951e+01 9.684e+01 1.065e+02 1.269e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 00:58:36,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281160.0, ans=0.1 2023-11-28 00:59:04,322 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-28 00:59:11,108 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11250, loss[loss=0.05746, simple_loss=0.08102, pruned_loss=0.009335, audio_tagging_loss=0.007618, over 15816.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08855, pruned_loss=0.01232, audio_tagging_loss=0.009151, over 3048526.23 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:59:11,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3281360.0, ans=0.0 2023-11-28 00:59:22,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3281426.6666666665, ans=0.0 2023-11-28 00:59:27,564 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:59:41,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3281493.3333333335, ans=0.0 2023-11-28 01:00:01,793 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-28 01:00:02,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2023-11-28 01:00:08,267 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11300, loss[loss=0.07584, simple_loss=0.1077, pruned_loss=0.01263, audio_tagging_loss=0.009363, over 14819.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08939, pruned_loss=0.01245, audio_tagging_loss=0.009015, over 3047094.20 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:00:16,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3281693.3333333335, ans=0.2 2023-11-28 01:00:17,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.951e+01 9.399e+01 1.008e+02 1.489e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 01:00:26,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3281760.0, ans=0.0 2023-11-28 01:00:31,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3281826.6666666665, ans=0.125 2023-11-28 01:00:48,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3281893.3333333335, ans=0.125 2023-11-28 01:00:58,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281960.0, ans=0.1 2023-11-28 01:00:59,887 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-28 01:01:06,436 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11350, loss[loss=0.06462, simple_loss=0.08048, pruned_loss=0.01439, audio_tagging_loss=0.009983, over 14542.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08966, pruned_loss=0.01246, audio_tagging_loss=0.008813, over 3042182.45 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:01:08,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3282026.6666666665, ans=0.2 2023-11-28 01:01:24,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3282093.3333333335, ans=0.2 2023-11-28 01:01:56,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-28 01:02:02,938 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11400, loss[loss=0.04641, simple_loss=0.05243, pruned_loss=0.009386, audio_tagging_loss=0.01081, over 13459.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.0899, pruned_loss=0.0125, audio_tagging_loss=0.008703, over 3039671.77 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:02:05,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-28 01:02:11,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.755e+01 9.421e+01 1.020e+02 1.331e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 01:02:14,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-28 01:02:17,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282426.6666666665, ans=0.1 2023-11-28 01:02:44,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3282560.0, ans=0.125 2023-11-28 01:02:53,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-28 01:02:58,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3282626.6666666665, ans=0.125 2023-11-28 01:02:59,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3282626.6666666665, ans=0.125 2023-11-28 01:03:00,925 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11450, loss[loss=0.06195, simple_loss=0.08441, pruned_loss=0.01202, audio_tagging_loss=0.007732, over 15901.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08957, pruned_loss=0.01239, audio_tagging_loss=0.008757, over 3039400.52 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:03:44,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3282893.3333333335, ans=0.1 2023-11-28 01:03:49,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3282960.0, ans=0.125 2023-11-28 01:03:51,992 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-28 01:03:52,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3282960.0, ans=15.0 2023-11-28 01:03:58,519 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11500, loss[loss=0.08363, simple_loss=0.1134, pruned_loss=0.0187, audio_tagging_loss=0.00822, over 15481.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08952, pruned_loss=0.01235, audio_tagging_loss=0.008735, over 3045751.91 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:03:58,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3283026.6666666665, ans=0.125 2023-11-28 01:04:06,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.678e+01 9.475e+01 1.027e+02 1.615e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:04:13,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3283093.3333333335, ans=0.0 2023-11-28 01:04:19,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=12.0 2023-11-28 01:04:24,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3283160.0, ans=0.125 2023-11-28 01:04:33,957 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:04:49,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-28 01:04:54,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3283360.0, ans=0.0 2023-11-28 01:04:54,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3283360.0, ans=0.125 2023-11-28 01:04:55,703 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11550, loss[loss=0.0721, simple_loss=0.09047, pruned_loss=0.01621, audio_tagging_loss=0.01065, over 14130.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.0905, pruned_loss=0.01262, audio_tagging_loss=0.008647, over 3049510.78 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:05:15,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3283426.6666666665, ans=0.0 2023-11-28 01:05:15,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3283426.6666666665, ans=0.125 2023-11-28 01:05:32,974 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:05:43,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2023-11-28 01:05:46,485 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-28 01:05:53,380 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11600, loss[loss=0.07773, simple_loss=0.1054, pruned_loss=0.01496, audio_tagging_loss=0.01006, over 14867.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.0913, pruned_loss=0.0126, audio_tagging_loss=0.00855, over 3051075.99 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:05:56,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3283693.3333333335, ans=0.125 2023-11-28 01:06:00,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283693.3333333335, ans=0.1 2023-11-28 01:06:03,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.077e+01 9.650e+01 1.023e+02 1.320e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 01:06:10,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3283760.0, ans=0.2 2023-11-28 01:06:20,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3283826.6666666665, ans=0.0 2023-11-28 01:06:26,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3283893.3333333335, ans=0.0 2023-11-28 01:06:28,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2023-11-28 01:06:29,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3283893.3333333335, ans=0.1 2023-11-28 01:06:43,870 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-28 01:06:50,999 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11650, loss[loss=0.05777, simple_loss=0.0799, pruned_loss=0.009512, audio_tagging_loss=0.008311, over 15010.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09039, pruned_loss=0.01258, audio_tagging_loss=0.008649, over 3050556.44 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:07:13,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3284160.0, ans=0.0 2023-11-28 01:07:41,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-28 01:07:48,463 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11700, loss[loss=0.04904, simple_loss=0.0711, pruned_loss=0.005842, audio_tagging_loss=0.007647, over 15730.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09001, pruned_loss=0.01256, audio_tagging_loss=0.008763, over 3057776.57 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:07:50,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3284360.0, ans=0.0 2023-11-28 01:07:51,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3284360.0, ans=0.125 2023-11-28 01:07:54,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3284360.0, ans=0.2 2023-11-28 01:07:58,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.938e+01 9.502e+01 1.017e+02 1.872e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 01:08:21,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3284560.0, ans=0.125 2023-11-28 01:08:36,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3284626.6666666665, ans=0.0 2023-11-28 01:08:39,178 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-28 01:08:46,005 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11750, loss[loss=0.0601, simple_loss=0.07997, pruned_loss=0.009291, audio_tagging_loss=0.01083, over 15253.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08946, pruned_loss=0.01238, audio_tagging_loss=0.008726, over 3055869.29 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:08:47,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3284693.3333333335, ans=0.125 2023-11-28 01:09:07,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3284826.6666666665, ans=0.125 2023-11-28 01:09:36,236 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-28 01:09:43,362 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11800, loss[loss=0.07146, simple_loss=0.1087, pruned_loss=0.008984, audio_tagging_loss=0.008134, over 15745.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0896, pruned_loss=0.01248, audio_tagging_loss=0.008783, over 3052618.68 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:09:53,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.795e+01 9.542e+01 1.022e+02 1.386e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 01:10:10,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3285160.0, ans=0.125 2023-11-28 01:10:26,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3285226.6666666665, ans=0.125 2023-11-28 01:10:33,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-28 01:10:40,538 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11850, loss[loss=0.0544, simple_loss=0.0779, pruned_loss=0.006678, audio_tagging_loss=0.008774, over 15566.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08877, pruned_loss=0.01231, audio_tagging_loss=0.008821, over 3055744.61 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:11:06,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3285493.3333333335, ans=0.1 2023-11-28 01:11:17,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3285560.0, ans=0.5 2023-11-28 01:11:20,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3285560.0, ans=0.1 2023-11-28 01:11:22,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3285560.0, ans=0.0 2023-11-28 01:11:31,338 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-28 01:11:34,214 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:11:38,308 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11900, loss[loss=0.07968, simple_loss=0.1054, pruned_loss=0.01764, audio_tagging_loss=0.009349, over 14324.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08941, pruned_loss=0.01228, audio_tagging_loss=0.008857, over 3058648.37 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:11:47,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3285693.3333333335, ans=0.125 2023-11-28 01:11:48,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.587e+01 9.286e+01 9.981e+01 1.301e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 01:12:01,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3285826.6666666665, ans=0.2 2023-11-28 01:12:12,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3285893.3333333335, ans=0.125 2023-11-28 01:12:21,135 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:12:29,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-28 01:12:36,108 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11950, loss[loss=0.0752, simple_loss=0.1003, pruned_loss=0.01692, audio_tagging_loss=0.008116, over 15199.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08874, pruned_loss=0.01222, audio_tagging_loss=0.008933, over 3056815.56 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:12:48,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3286093.3333333335, ans=0.125 2023-11-28 01:12:50,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3286093.3333333335, ans=0.125 2023-11-28 01:13:24,819 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-28 01:13:30,982 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 12000, loss[loss=0.06725, simple_loss=0.0913, pruned_loss=0.0121, audio_tagging_loss=0.009505, over 14724.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08885, pruned_loss=0.01219, audio_tagging_loss=0.009007, over 3053431.66 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:13:30,982 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 01:13:46,617 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.3073, 4.4060, 3.9981, 4.2739], device='cuda:3') 2023-11-28 01:13:54,962 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3072, 4.2712, 4.4712, 4.4487], device='cuda:3') 2023-11-28 01:13:56,621 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9981, 3.8851, 4.8121, 4.4119], device='cuda:3') 2023-11-28 01:14:03,262 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9405, 3.0812, 2.7874, 2.9041, 3.3669, 2.6524, 3.3719, 2.5480], device='cuda:3') 2023-11-28 01:14:05,667 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005209, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 01:14:05,668 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 01:14:08,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-28 01:14:10,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3286360.0, ans=0.2 2023-11-28 01:14:12,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3286360.0, ans=0.1 2023-11-28 01:14:15,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.802e+01 9.296e+01 1.010e+02 1.466e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 01:14:15,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3286426.6666666665, ans=0.125 2023-11-28 01:14:23,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3286426.6666666665, ans=0.0 2023-11-28 01:14:48,428 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 0, loss[loss=0.07103, simple_loss=0.08051, pruned_loss=0.01066, audio_tagging_loss=0.02011, over 14886.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.08051, pruned_loss=0.01066, audio_tagging_loss=0.02011, over 14886.00 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:14:48,428 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 01:15:03,298 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3649, 3.4162, 3.7220, 3.6371], device='cuda:3') 2023-11-28 01:15:22,236 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05771, simple_loss=0.05063, pruned_loss=0.005208, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 01:15:22,237 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 01:15:24,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3286513.3333333335, ans=0.025 2023-11-28 01:15:24,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-28 01:15:25,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3286513.3333333335, ans=0.0 2023-11-28 01:15:31,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3286513.3333333335, ans=0.5 2023-11-28 01:15:39,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3286580.0, ans=0.0 2023-11-28 01:15:45,688 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-28 01:15:50,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-28 01:15:51,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2023-11-28 01:16:00,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-11-28 01:16:19,447 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 50, loss[loss=0.08895, simple_loss=0.1186, pruned_loss=0.01664, audio_tagging_loss=0.01304, over 15841.00 frames. ], tot_loss[loss=0.07087, simple_loss=0.08464, pruned_loss=0.01132, audio_tagging_loss=0.01723, over 687944.65 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:16:40,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-11-28 01:16:41,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3286980.0, ans=0.09899494936611666 2023-11-28 01:16:43,986 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-28 01:16:46,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3286980.0, ans=0.125 2023-11-28 01:17:01,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.634e+01 1.031e+02 1.127e+02 1.457e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-28 01:17:09,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3287113.3333333335, ans=0.0 2023-11-28 01:17:16,853 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 100, loss[loss=0.05473, simple_loss=0.06232, pruned_loss=0.005777, audio_tagging_loss=0.01779, over 15189.00 frames. ], tot_loss[loss=0.07199, simple_loss=0.08738, pruned_loss=0.01199, audio_tagging_loss=0.01631, over 1204197.09 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:17:22,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-28 01:17:26,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2023-11-28 01:17:42,083 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-28 01:17:47,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3287313.3333333335, ans=0.2 2023-11-28 01:18:09,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=15.0 2023-11-28 01:18:15,152 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 150, loss[loss=0.06727, simple_loss=0.08753, pruned_loss=0.01262, audio_tagging_loss=0.01088, over 15963.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.08807, pruned_loss=0.01221, audio_tagging_loss=0.0146, over 1614997.98 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:18:15,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3287513.3333333335, ans=0.125 2023-11-28 01:18:38,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-28 01:18:38,952 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-28 01:18:39,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3287646.6666666665, ans=0.2 2023-11-28 01:18:51,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-28 01:18:57,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.959e+01 9.616e+01 1.058e+02 1.322e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 01:19:12,871 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 200, loss[loss=0.08944, simple_loss=0.1304, pruned_loss=0.01837, audio_tagging_loss=0.005879, over 15386.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.08867, pruned_loss=0.01222, audio_tagging_loss=0.01294, over 1936505.64 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:19:37,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-28 01:19:43,058 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:19:45,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3287980.0, ans=0.1 2023-11-28 01:20:00,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3288113.3333333335, ans=0.125 2023-11-28 01:20:09,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3288113.3333333335, ans=0.1 2023-11-28 01:20:11,194 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 250, loss[loss=0.08107, simple_loss=0.1156, pruned_loss=0.01868, audio_tagging_loss=0.004562, over 15811.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.09011, pruned_loss=0.01249, audio_tagging_loss=0.0116, over 2177275.63 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:20:13,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3288180.0, ans=0.2 2023-11-28 01:20:19,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3288180.0, ans=0.125 2023-11-28 01:20:24,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3288246.6666666665, ans=0.0 2023-11-28 01:20:36,482 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-28 01:20:38,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3288313.3333333335, ans=0.05 2023-11-28 01:20:50,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3288380.0, ans=0.125 2023-11-28 01:20:52,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.203e+01 9.691e+01 1.057e+02 1.267e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 01:21:04,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3288446.6666666665, ans=0.125 2023-11-28 01:21:09,022 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 300, loss[loss=0.06146, simple_loss=0.07897, pruned_loss=0.01372, audio_tagging_loss=0.008255, over 15541.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09045, pruned_loss=0.01245, audio_tagging_loss=0.01077, over 2372458.90 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:21:33,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-28 01:21:38,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-28 01:21:41,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.45 vs. limit=10.0 2023-11-28 01:21:45,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3288713.3333333335, ans=0.125 2023-11-28 01:21:48,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3288713.3333333335, ans=0.04949747468305833 2023-11-28 01:21:53,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2023-11-28 01:22:03,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3288780.0, ans=0.125 2023-11-28 01:22:06,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3288846.6666666665, ans=0.125 2023-11-28 01:22:06,950 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 350, loss[loss=0.05963, simple_loss=0.08634, pruned_loss=0.01034, audio_tagging_loss=0.006122, over 16153.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09054, pruned_loss=0.01257, audio_tagging_loss=0.01008, over 2519838.58 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:22:29,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-28 01:22:30,705 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-28 01:22:49,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.671e+01 9.361e+01 1.014e+02 1.227e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 01:22:54,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-28 01:22:57,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-28 01:23:03,883 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 400, loss[loss=0.06814, simple_loss=0.09345, pruned_loss=0.01425, audio_tagging_loss=0.007164, over 16746.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09111, pruned_loss=0.01277, audio_tagging_loss=0.009667, over 2642308.65 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:23:27,945 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-28 01:23:31,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3289313.3333333335, ans=0.04949747468305833 2023-11-28 01:23:38,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3289380.0, ans=0.2 2023-11-28 01:23:50,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-28 01:24:01,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3289513.3333333335, ans=0.2 2023-11-28 01:24:01,998 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 450, loss[loss=0.04674, simple_loss=0.05981, pruned_loss=0.008163, audio_tagging_loss=0.008671, over 14864.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.08981, pruned_loss=0.01261, audio_tagging_loss=0.009443, over 2727117.47 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:24:14,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-28 01:24:18,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3289580.0, ans=0.015 2023-11-28 01:24:19,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3289580.0, ans=0.125 2023-11-28 01:24:26,404 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-28 01:24:45,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.828e+01 9.254e+01 1.009e+02 1.850e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 01:24:50,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-11-28 01:24:52,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3289780.0, ans=0.0 2023-11-28 01:24:59,848 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 500, loss[loss=0.07032, simple_loss=0.09988, pruned_loss=0.01126, audio_tagging_loss=0.009126, over 14930.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.0897, pruned_loss=0.01251, audio_tagging_loss=0.009262, over 2795818.70 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:25:04,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3289846.6666666665, ans=10.0 2023-11-28 01:25:13,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-28 01:25:17,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-28 01:25:17,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3289913.3333333335, ans=10.0 2023-11-28 01:25:18,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3289913.3333333335, ans=10.0 2023-11-28 01:25:23,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-28 01:25:28,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-11-28 01:25:35,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3290046.6666666665, ans=0.2 2023-11-28 01:25:55,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3290113.3333333335, ans=0.125 2023-11-28 01:25:57,457 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 550, loss[loss=0.0609, simple_loss=0.08714, pruned_loss=0.01094, audio_tagging_loss=0.006391, over 15166.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08989, pruned_loss=0.01244, audio_tagging_loss=0.009169, over 2853989.60 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:26:13,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3290246.6666666665, ans=0.125 2023-11-28 01:26:21,497 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-28 01:26:41,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.718e+01 9.476e+01 1.036e+02 1.288e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:26:55,511 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 600, loss[loss=0.07476, simple_loss=0.0942, pruned_loss=0.01618, audio_tagging_loss=0.01149, over 14649.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09047, pruned_loss=0.01242, audio_tagging_loss=0.008989, over 2893660.25 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:27:04,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-28 01:27:04,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3290513.3333333335, ans=0.2 2023-11-28 01:27:05,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3290580.0, ans=0.125 2023-11-28 01:27:20,224 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-28 01:27:21,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3290646.6666666665, ans=0.2 2023-11-28 01:27:23,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3290646.6666666665, ans=0.125 2023-11-28 01:27:33,926 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:27:53,955 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 650, loss[loss=0.0625, simple_loss=0.08434, pruned_loss=0.01189, audio_tagging_loss=0.00844, over 13934.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09069, pruned_loss=0.01253, audio_tagging_loss=0.008978, over 2929550.74 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:28:04,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3290913.3333333335, ans=0.1 2023-11-28 01:28:17,813 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-28 01:28:38,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.722e+01 9.347e+01 9.970e+01 1.370e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 01:28:39,185 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:28:41,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3291113.3333333335, ans=0.125 2023-11-28 01:28:51,645 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 700, loss[loss=0.08932, simple_loss=0.13, pruned_loss=0.01878, audio_tagging_loss=0.00553, over 16745.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09193, pruned_loss=0.01261, audio_tagging_loss=0.009003, over 2954330.86 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:28:52,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3291180.0, ans=0.07 2023-11-28 01:28:56,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=22.5 2023-11-28 01:29:03,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-28 01:29:15,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-28 01:29:15,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3291313.3333333335, ans=0.125 2023-11-28 01:29:33,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3291380.0, ans=0.09899494936611666 2023-11-28 01:29:49,681 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 750, loss[loss=0.06138, simple_loss=0.08176, pruned_loss=0.01091, audio_tagging_loss=0.009591, over 17288.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09095, pruned_loss=0.01267, audio_tagging_loss=0.009013, over 2981109.28 frames. ], batch size: 64, lr: 1.62e-03, grad_scale: 4.0 2023-11-28 01:30:09,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3291580.0, ans=0.125 2023-11-28 01:30:12,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3291646.6666666665, ans=0.07 2023-11-28 01:30:13,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-28 01:30:20,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3291646.6666666665, ans=0.0 2023-11-28 01:30:36,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.375e+01 9.953e+01 1.444e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 01:30:47,078 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 800, loss[loss=0.08874, simple_loss=0.1225, pruned_loss=0.01955, audio_tagging_loss=0.007958, over 15350.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09044, pruned_loss=0.0127, audio_tagging_loss=0.009057, over 2998111.77 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:30:50,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-28 01:30:51,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3291846.6666666665, ans=0.125 2023-11-28 01:30:57,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3291913.3333333335, ans=0.2 2023-11-28 01:31:09,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3291980.0, ans=0.2 2023-11-28 01:31:10,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3291980.0, ans=0.125 2023-11-28 01:31:10,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-28 01:31:11,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-28 01:31:17,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-28 01:31:21,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3292046.6666666665, ans=0.125 2023-11-28 01:31:26,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-28 01:31:31,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3292046.6666666665, ans=0.1 2023-11-28 01:31:35,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-28 01:31:45,315 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 850, loss[loss=0.07538, simple_loss=0.1008, pruned_loss=0.01571, audio_tagging_loss=0.009248, over 14813.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09057, pruned_loss=0.01264, audio_tagging_loss=0.009021, over 3011011.09 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:31:45,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3292180.0, ans=0.2 2023-11-28 01:31:48,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-28 01:32:03,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3292246.6666666665, ans=0.125 2023-11-28 01:32:07,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3292313.3333333335, ans=0.125 2023-11-28 01:32:07,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3292313.3333333335, ans=0.2 2023-11-28 01:32:09,981 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-28 01:32:14,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-11-28 01:32:16,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3292313.3333333335, ans=0.04949747468305833 2023-11-28 01:32:26,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3292380.0, ans=10.0 2023-11-28 01:32:27,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3292380.0, ans=0.125 2023-11-28 01:32:31,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.796e+01 9.707e+01 1.029e+02 1.774e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 01:32:43,116 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 900, loss[loss=0.05453, simple_loss=0.07451, pruned_loss=0.008285, audio_tagging_loss=0.008994, over 15434.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09121, pruned_loss=0.01267, audio_tagging_loss=0.009018, over 3020616.51 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:33:07,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-28 01:33:23,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3292713.3333333335, ans=0.125 2023-11-28 01:33:27,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2023-11-28 01:33:28,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3292780.0, ans=0.2 2023-11-28 01:33:36,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-28 01:33:41,071 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 950, loss[loss=0.07297, simple_loss=0.106, pruned_loss=0.01206, audio_tagging_loss=0.007905, over 16496.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0911, pruned_loss=0.01244, audio_tagging_loss=0.008969, over 3028163.42 frames. ], batch size: 61, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:33:47,314 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:33:55,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3292913.3333333335, ans=0.1 2023-11-28 01:34:05,454 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-28 01:34:13,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-28 01:34:16,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.50 vs. limit=12.0 2023-11-28 01:34:27,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.099e+01 8.712e+01 9.268e+01 9.903e+01 1.259e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 01:34:35,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3293113.3333333335, ans=0.2 2023-11-28 01:34:38,903 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1000, loss[loss=0.03767, simple_loss=0.03789, pruned_loss=0.003795, audio_tagging_loss=0.01493, over 14763.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09016, pruned_loss=0.01233, audio_tagging_loss=0.008866, over 3023805.72 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:34:39,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3293180.0, ans=0.125 2023-11-28 01:34:43,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3293180.0, ans=0.5 2023-11-28 01:34:44,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3293180.0, ans=0.0 2023-11-28 01:34:45,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3293180.0, ans=0.125 2023-11-28 01:35:03,347 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-28 01:35:05,783 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:35:13,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-28 01:35:15,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-28 01:35:20,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293380.0, ans=0.1 2023-11-28 01:35:22,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-28 01:35:24,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3293446.6666666665, ans=0.2 2023-11-28 01:35:37,061 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1050, loss[loss=0.05533, simple_loss=0.07701, pruned_loss=0.009075, audio_tagging_loss=0.00775, over 15101.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08975, pruned_loss=0.01215, audio_tagging_loss=0.00873, over 3030885.05 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:35:37,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3293513.3333333335, ans=0.125 2023-11-28 01:35:40,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3293513.3333333335, ans=0.1 2023-11-28 01:35:40,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-28 01:35:55,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3293580.0, ans=0.0 2023-11-28 01:36:01,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-28 01:36:23,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.510e+01 9.155e+01 1.003e+02 1.223e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-28 01:36:26,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293780.0, ans=0.125 2023-11-28 01:36:35,312 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1100, loss[loss=0.06147, simple_loss=0.08279, pruned_loss=0.009686, audio_tagging_loss=0.01039, over 15625.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08909, pruned_loss=0.01195, audio_tagging_loss=0.00868, over 3036118.40 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:36:39,870 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:36:46,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3293913.3333333335, ans=10.0 2023-11-28 01:36:58,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-28 01:37:00,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-11-28 01:37:21,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3294113.3333333335, ans=0.125 2023-11-28 01:37:32,932 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1150, loss[loss=0.06441, simple_loss=0.09851, pruned_loss=0.008933, audio_tagging_loss=0.006224, over 15430.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09122, pruned_loss=0.01234, audio_tagging_loss=0.008585, over 3036171.31 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:37:46,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3294246.6666666665, ans=0.125 2023-11-28 01:37:57,719 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-28 01:38:12,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3294380.0, ans=0.1 2023-11-28 01:38:18,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 8.708e+01 9.340e+01 1.012e+02 1.442e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:38:29,990 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1200, loss[loss=0.05833, simple_loss=0.0853, pruned_loss=0.008372, audio_tagging_loss=0.007307, over 15668.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09114, pruned_loss=0.01244, audio_tagging_loss=0.008535, over 3035736.85 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:38:31,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3294513.3333333335, ans=0.95 2023-11-28 01:38:32,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3294513.3333333335, ans=0.0 2023-11-28 01:38:33,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3294513.3333333335, ans=0.125 2023-11-28 01:38:36,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=22.5 2023-11-28 01:38:37,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3294513.3333333335, ans=0.0 2023-11-28 01:38:46,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-28 01:38:54,973 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-28 01:38:55,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3294646.6666666665, ans=10.0 2023-11-28 01:39:09,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3294713.3333333335, ans=0.0 2023-11-28 01:39:10,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-28 01:39:13,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-28 01:39:29,036 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1250, loss[loss=0.04291, simple_loss=0.05564, pruned_loss=0.007674, audio_tagging_loss=0.007415, over 14716.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08992, pruned_loss=0.01233, audio_tagging_loss=0.008541, over 3041547.90 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:39:42,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3294913.3333333335, ans=0.0 2023-11-28 01:39:50,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3294980.0, ans=0.0 2023-11-28 01:39:52,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-28 01:39:54,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2023-11-28 01:39:57,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3294980.0, ans=0.0 2023-11-28 01:39:58,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3294980.0, ans=0.2 2023-11-28 01:40:00,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-28 01:40:13,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3295046.6666666665, ans=0.125 2023-11-28 01:40:15,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.552e+01 9.215e+01 9.963e+01 1.305e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 01:40:26,868 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1300, loss[loss=0.04423, simple_loss=0.0466, pruned_loss=0.008553, audio_tagging_loss=0.01237, over 14645.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08868, pruned_loss=0.01211, audio_tagging_loss=0.008692, over 3036910.92 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:40:32,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295180.0, ans=0.1 2023-11-28 01:40:39,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3295246.6666666665, ans=0.2 2023-11-28 01:40:50,486 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-28 01:40:52,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3295313.3333333335, ans=0.125 2023-11-28 01:40:58,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=12.0 2023-11-28 01:41:16,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3295446.6666666665, ans=0.0 2023-11-28 01:41:17,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3295446.6666666665, ans=0.125 2023-11-28 01:41:23,998 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1350, loss[loss=0.07027, simple_loss=0.09676, pruned_loss=0.01417, audio_tagging_loss=0.007726, over 14083.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08998, pruned_loss=0.0124, audio_tagging_loss=0.008559, over 3044136.00 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:41:48,824 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-28 01:41:53,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:41:54,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3295646.6666666665, ans=0.0 2023-11-28 01:41:55,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:42:05,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3295713.3333333335, ans=0.0 2023-11-28 01:42:08,155 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:42:09,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3295780.0, ans=0.05 2023-11-28 01:42:10,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.515e+01 9.138e+01 9.769e+01 1.555e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 01:42:15,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3295780.0, ans=0.125 2023-11-28 01:42:19,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3295780.0, ans=0.125 2023-11-28 01:42:20,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2023-11-28 01:42:22,287 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1400, loss[loss=0.06066, simple_loss=0.08508, pruned_loss=0.0108, audio_tagging_loss=0.007314, over 15237.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09, pruned_loss=0.01255, audio_tagging_loss=0.008631, over 3047887.31 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:42:45,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3295980.0, ans=0.0 2023-11-28 01:42:46,752 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-28 01:42:49,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-28 01:43:05,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-28 01:43:20,763 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1450, loss[loss=0.05615, simple_loss=0.07883, pruned_loss=0.008717, audio_tagging_loss=0.008018, over 14119.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08969, pruned_loss=0.01251, audio_tagging_loss=0.008754, over 3048945.50 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:43:24,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3296180.0, ans=0.1 2023-11-28 01:43:28,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-28 01:43:31,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3296246.6666666665, ans=0.125 2023-11-28 01:43:44,162 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-28 01:43:50,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=15.0 2023-11-28 01:44:06,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 8.990e+01 9.437e+01 1.012e+02 1.630e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 01:44:17,586 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1500, loss[loss=0.05774, simple_loss=0.07929, pruned_loss=0.008705, audio_tagging_loss=0.009391, over 14197.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08922, pruned_loss=0.01236, audio_tagging_loss=0.008824, over 3044118.04 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:44:20,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3296513.3333333335, ans=0.0 2023-11-28 01:44:37,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3296580.0, ans=0.1 2023-11-28 01:44:42,238 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-28 01:45:05,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3296780.0, ans=0.0 2023-11-28 01:45:07,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3296780.0, ans=0.125 2023-11-28 01:45:15,919 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1550, loss[loss=0.08521, simple_loss=0.1155, pruned_loss=0.01873, audio_tagging_loss=0.008714, over 14907.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08875, pruned_loss=0.01232, audio_tagging_loss=0.008914, over 3040145.45 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:45:18,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3296846.6666666665, ans=0.125 2023-11-28 01:45:28,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3296913.3333333335, ans=0.2 2023-11-28 01:45:40,189 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-28 01:45:41,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3296980.0, ans=0.09899494936611666 2023-11-28 01:45:41,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3296980.0, ans=0.125 2023-11-28 01:45:43,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3296980.0, ans=0.125 2023-11-28 01:45:46,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3296980.0, ans=0.125 2023-11-28 01:45:55,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3297046.6666666665, ans=0.125 2023-11-28 01:45:56,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.14 vs. limit=10.0 2023-11-28 01:46:01,877 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.675e+01 9.125e+01 9.756e+01 1.252e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-28 01:46:13,993 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1600, loss[loss=0.07875, simple_loss=0.1028, pruned_loss=0.01913, audio_tagging_loss=0.008211, over 15799.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08882, pruned_loss=0.01245, audio_tagging_loss=0.008972, over 3041687.70 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:46:17,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3297180.0, ans=0.1 2023-11-28 01:46:27,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3297246.6666666665, ans=0.125 2023-11-28 01:46:31,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2023-11-28 01:46:37,075 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-28 01:46:39,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3297313.3333333335, ans=0.1 2023-11-28 01:47:09,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3297513.3333333335, ans=0.125 2023-11-28 01:47:10,423 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1650, loss[loss=0.06566, simple_loss=0.09819, pruned_loss=0.007568, audio_tagging_loss=0.008997, over 15292.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09011, pruned_loss=0.0126, audio_tagging_loss=0.008955, over 3037735.17 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:47:21,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3297580.0, ans=0.125 2023-11-28 01:47:29,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3297580.0, ans=0.125 2023-11-28 01:47:34,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-28 01:47:57,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.801e+01 9.426e+01 1.009e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 01:47:57,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3297780.0, ans=0.07 2023-11-28 01:47:59,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-28 01:48:08,364 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1700, loss[loss=0.07248, simple_loss=0.1018, pruned_loss=0.0128, audio_tagging_loss=0.008789, over 16109.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08999, pruned_loss=0.01242, audio_tagging_loss=0.009018, over 3039140.69 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:48:25,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3297913.3333333335, ans=0.125 2023-11-28 01:48:32,322 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-28 01:48:42,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-28 01:48:43,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-28 01:48:50,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3298046.6666666665, ans=0.035 2023-11-28 01:48:55,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3298113.3333333335, ans=0.125 2023-11-28 01:48:55,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2023-11-28 01:49:04,879 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1750, loss[loss=0.06196, simple_loss=0.09041, pruned_loss=0.008315, audio_tagging_loss=0.008441, over 14660.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09155, pruned_loss=0.01274, audio_tagging_loss=0.008907, over 3041847.36 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:49:05,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-28 01:49:17,054 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:49:22,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3298246.6666666665, ans=0.0 2023-11-28 01:49:23,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3298246.6666666665, ans=0.0 2023-11-28 01:49:29,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-28 01:49:32,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3298313.3333333335, ans=0.05 2023-11-28 01:49:40,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3298380.0, ans=0.2 2023-11-28 01:49:52,445 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.816e+01 9.529e+01 1.029e+02 1.383e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 01:49:54,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-28 01:50:00,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3298446.6666666665, ans=0.125 2023-11-28 01:50:02,959 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1800, loss[loss=0.06067, simple_loss=0.08564, pruned_loss=0.008716, audio_tagging_loss=0.009128, over 15451.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09084, pruned_loss=0.01264, audio_tagging_loss=0.008858, over 3039964.60 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:50:07,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3298513.3333333335, ans=0.0 2023-11-28 01:50:09,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3298513.3333333335, ans=0.04949747468305833 2023-11-28 01:50:12,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3298580.0, ans=0.0 2023-11-28 01:50:14,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298580.0, ans=0.1 2023-11-28 01:50:20,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3298580.0, ans=0.125 2023-11-28 01:50:26,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-28 01:50:28,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3298646.6666666665, ans=0.2 2023-11-28 01:50:30,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3298646.6666666665, ans=0.2 2023-11-28 01:51:00,589 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1850, loss[loss=0.08001, simple_loss=0.1125, pruned_loss=0.01517, audio_tagging_loss=0.008566, over 16723.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09059, pruned_loss=0.01242, audio_tagging_loss=0.008691, over 3043366.91 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:51:06,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3298846.6666666665, ans=0.1 2023-11-28 01:51:07,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-28 01:51:12,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3298913.3333333335, ans=0.125 2023-11-28 01:51:14,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3298913.3333333335, ans=0.125 2023-11-28 01:51:25,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-28 01:51:29,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3298980.0, ans=0.1 2023-11-28 01:51:34,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:48,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.704e+01 9.342e+01 1.015e+02 1.516e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:51:58,687 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1900, loss[loss=0.07292, simple_loss=0.1065, pruned_loss=0.01188, audio_tagging_loss=0.007813, over 15459.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0892, pruned_loss=0.01228, audio_tagging_loss=0.008734, over 3041953.58 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:52:11,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-28 01:52:15,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-28 01:52:15,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-28 01:52:22,938 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-28 01:52:40,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3299380.0, ans=0.04949747468305833 2023-11-28 01:52:41,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3299380.0, ans=0.125 2023-11-28 01:52:52,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2023-11-28 01:52:56,198 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1950, loss[loss=0.07511, simple_loss=0.1037, pruned_loss=0.01484, audio_tagging_loss=0.008441, over 15448.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08992, pruned_loss=0.01244, audio_tagging_loss=0.008697, over 3045932.82 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:53:20,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-28 01:53:36,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2023-11-28 01:53:42,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3299780.0, ans=0.0 2023-11-28 01:53:43,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.727e+01 9.410e+01 1.013e+02 1.318e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 01:53:44,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3299780.0, ans=0.2 2023-11-28 01:53:47,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3299780.0, ans=10.0 2023-11-28 01:53:53,605 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2000, loss[loss=0.05815, simple_loss=0.07469, pruned_loss=0.01199, audio_tagging_loss=0.008804, over 15550.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08959, pruned_loss=0.01248, audio_tagging_loss=0.008671, over 3041266.04 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:54:17,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-28 01:54:18,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2023-11-28 01:54:44,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3300113.3333333335, ans=0.0 2023-11-28 01:54:46,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3300113.3333333335, ans=0.125 2023-11-28 01:54:51,447 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2050, loss[loss=0.05334, simple_loss=0.06848, pruned_loss=0.008968, audio_tagging_loss=0.01013, over 14948.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08905, pruned_loss=0.01236, audio_tagging_loss=0.008712, over 3037101.79 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:55:00,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-28 01:55:15,647 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-28 01:55:16,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3300313.3333333335, ans=0.2 2023-11-28 01:55:27,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.96 vs. limit=10.0 2023-11-28 01:55:32,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-28 01:55:38,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.786e+01 9.334e+01 1.004e+02 1.293e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 01:55:40,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=15.0 2023-11-28 01:55:45,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3300446.6666666665, ans=0.035 2023-11-28 01:55:49,308 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2100, loss[loss=0.04944, simple_loss=0.06445, pruned_loss=0.005685, audio_tagging_loss=0.01153, over 14699.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08908, pruned_loss=0.01254, audio_tagging_loss=0.008614, over 3032472.87 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:55:57,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3300513.3333333335, ans=0.125 2023-11-28 01:55:59,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3300513.3333333335, ans=0.07 2023-11-28 01:56:11,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-28 01:56:13,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3300646.6666666665, ans=0.5 2023-11-28 01:56:14,136 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-28 01:56:47,189 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2150, loss[loss=0.07621, simple_loss=0.1037, pruned_loss=0.01513, audio_tagging_loss=0.009218, over 14776.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09003, pruned_loss=0.01258, audio_tagging_loss=0.008636, over 3034652.17 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:56:47,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3300846.6666666665, ans=0.2 2023-11-28 01:56:50,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3300846.6666666665, ans=0.0 2023-11-28 01:57:11,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-28 01:57:22,703 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:57:34,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.726e+01 9.403e+01 1.016e+02 1.279e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 01:57:43,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-28 01:57:45,115 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2200, loss[loss=0.06837, simple_loss=0.09901, pruned_loss=0.01297, audio_tagging_loss=0.005894, over 15999.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08942, pruned_loss=0.01254, audio_tagging_loss=0.008675, over 3036809.42 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:57:45,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3301180.0, ans=0.125 2023-11-28 01:57:45,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3301180.0, ans=0.125 2023-11-28 01:57:53,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-11-28 01:57:56,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3301246.6666666665, ans=0.0 2023-11-28 01:58:01,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3301246.6666666665, ans=0.125 2023-11-28 01:58:07,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3301313.3333333335, ans=0.0 2023-11-28 01:58:08,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-28 01:58:19,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3301380.0, ans=0.125 2023-11-28 01:58:26,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-11-28 01:58:31,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3301446.6666666665, ans=15.0 2023-11-28 01:58:39,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301446.6666666665, ans=0.1 2023-11-28 01:58:42,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-11-28 01:58:43,003 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2250, loss[loss=0.08254, simple_loss=0.1123, pruned_loss=0.01526, audio_tagging_loss=0.01113, over 15288.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08905, pruned_loss=0.01248, audio_tagging_loss=0.008785, over 3040223.12 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:58:54,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-11-28 01:58:56,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-11-28 01:59:01,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3301580.0, ans=0.2 2023-11-28 01:59:07,458 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-28 01:59:07,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3301646.6666666665, ans=0.125 2023-11-28 01:59:29,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.696e+01 9.309e+01 9.993e+01 1.259e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 01:59:39,876 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2300, loss[loss=0.07687, simple_loss=0.09611, pruned_loss=0.01864, audio_tagging_loss=0.01018, over 14951.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0895, pruned_loss=0.01246, audio_tagging_loss=0.008842, over 3034974.00 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:59:43,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301846.6666666665, ans=0.1 2023-11-28 02:00:00,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3301913.3333333335, ans=0.0 2023-11-28 02:00:04,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-28 02:00:04,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3301980.0, ans=0.09899494936611666 2023-11-28 02:00:22,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3302046.6666666665, ans=0.035 2023-11-28 02:00:23,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3302046.6666666665, ans=0.05 2023-11-28 02:00:28,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-28 02:00:29,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-28 02:00:32,678 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:00:35,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-28 02:00:38,607 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2350, loss[loss=0.06245, simple_loss=0.08774, pruned_loss=0.01171, audio_tagging_loss=0.006869, over 15070.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0901, pruned_loss=0.0124, audio_tagging_loss=0.00891, over 3039293.32 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:00:44,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3302180.0, ans=0.2 2023-11-28 02:00:45,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3302180.0, ans=0.125 2023-11-28 02:00:52,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.76 vs. limit=15.0 2023-11-28 02:00:54,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3302246.6666666665, ans=0.125 2023-11-28 02:01:02,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-28 02:01:06,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2023-11-28 02:01:12,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3302380.0, ans=0.125 2023-11-28 02:01:25,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.943e+01 9.346e+01 1.021e+02 1.230e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:01:28,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3302446.6666666665, ans=0.2 2023-11-28 02:01:36,075 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2400, loss[loss=0.08685, simple_loss=0.122, pruned_loss=0.01737, audio_tagging_loss=0.008497, over 14688.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0901, pruned_loss=0.01231, audio_tagging_loss=0.009016, over 3041577.37 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:01:59,815 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-28 02:02:01,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302646.6666666665, ans=0.1 2023-11-28 02:02:04,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302646.6666666665, ans=0.1 2023-11-28 02:02:05,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3302646.6666666665, ans=12.0 2023-11-28 02:02:08,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-28 02:02:20,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-28 02:02:26,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3302780.0, ans=0.125 2023-11-28 02:02:29,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3302780.0, ans=0.125 2023-11-28 02:02:32,968 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2450, loss[loss=0.06072, simple_loss=0.07907, pruned_loss=0.01278, audio_tagging_loss=0.008406, over 15702.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09041, pruned_loss=0.01237, audio_tagging_loss=0.008934, over 3041097.79 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:02:34,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3302846.6666666665, ans=0.125 2023-11-28 02:02:37,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3302846.6666666665, ans=0.125 2023-11-28 02:02:54,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-28 02:02:57,436 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-28 02:02:58,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.89 vs. limit=10.0 2023-11-28 02:03:03,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3302980.0, ans=0.2 2023-11-28 02:03:21,011 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.760e+01 9.295e+01 9.959e+01 1.249e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:03:31,360 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2500, loss[loss=0.04692, simple_loss=0.05638, pruned_loss=0.007712, audio_tagging_loss=0.01102, over 13501.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08977, pruned_loss=0.01224, audio_tagging_loss=0.009013, over 3035802.32 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:03:35,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3303180.0, ans=0.0 2023-11-28 02:03:48,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3303246.6666666665, ans=0.025 2023-11-28 02:03:54,987 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-28 02:03:55,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3303313.3333333335, ans=0.0 2023-11-28 02:04:02,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3303313.3333333335, ans=0.0 2023-11-28 02:04:03,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-28 02:04:28,458 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2550, loss[loss=0.07009, simple_loss=0.1012, pruned_loss=0.01141, audio_tagging_loss=0.008066, over 15560.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.0905, pruned_loss=0.01241, audio_tagging_loss=0.008936, over 3033212.08 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:04:35,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3303513.3333333335, ans=0.125 2023-11-28 02:04:49,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2023-11-28 02:04:52,392 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-28 02:04:59,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-28 02:05:08,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-28 02:05:11,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-28 02:05:17,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.589e+01 9.196e+01 9.860e+01 1.420e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 02:05:26,158 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2600, loss[loss=0.05991, simple_loss=0.08449, pruned_loss=0.007479, audio_tagging_loss=0.01019, over 15256.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.0897, pruned_loss=0.01225, audio_tagging_loss=0.008773, over 3038260.59 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:05:49,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3303980.0, ans=0.125 2023-11-28 02:05:50,759 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-28 02:06:08,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3304046.6666666665, ans=0.0 2023-11-28 02:06:15,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3304113.3333333335, ans=0.0 2023-11-28 02:06:23,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3304180.0, ans=0.0 2023-11-28 02:06:24,284 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2650, loss[loss=0.06214, simple_loss=0.08305, pruned_loss=0.01233, audio_tagging_loss=0.008278, over 15442.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08976, pruned_loss=0.01236, audio_tagging_loss=0.008734, over 3040294.49 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:06:27,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3304180.0, ans=0.125 2023-11-28 02:06:27,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3304180.0, ans=0.0 2023-11-28 02:06:30,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-28 02:06:33,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3304180.0, ans=0.125 2023-11-28 02:06:42,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2023-11-28 02:06:48,580 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-28 02:06:53,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3304313.3333333335, ans=0.2 2023-11-28 02:06:56,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3304313.3333333335, ans=0.0 2023-11-28 02:06:56,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3304313.3333333335, ans=0.125 2023-11-28 02:06:58,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3304380.0, ans=0.0 2023-11-28 02:07:03,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3304380.0, ans=0.0 2023-11-28 02:07:13,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.568e+01 9.215e+01 1.005e+02 1.316e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 02:07:17,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3304446.6666666665, ans=0.0 2023-11-28 02:07:21,988 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2700, loss[loss=0.06682, simple_loss=0.08627, pruned_loss=0.0143, audio_tagging_loss=0.009387, over 15287.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09031, pruned_loss=0.01264, audio_tagging_loss=0.008649, over 3039447.58 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:07:23,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3304513.3333333335, ans=0.1 2023-11-28 02:07:25,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.38 vs. limit=22.5 2023-11-28 02:07:28,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3304513.3333333335, ans=0.125 2023-11-28 02:07:37,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3304580.0, ans=0.125 2023-11-28 02:07:45,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-28 02:07:53,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-28 02:07:56,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3304713.3333333335, ans=0.125 2023-11-28 02:08:05,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3304713.3333333335, ans=0.0 2023-11-28 02:08:19,869 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2750, loss[loss=0.05527, simple_loss=0.07022, pruned_loss=0.007572, audio_tagging_loss=0.01259, over 15179.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09022, pruned_loss=0.01256, audio_tagging_loss=0.008625, over 3035984.79 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:08:30,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3304913.3333333335, ans=0.125 2023-11-28 02:08:33,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3304913.3333333335, ans=0.0 2023-11-28 02:08:35,225 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:08:43,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-28 02:08:45,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3304980.0, ans=0.125 2023-11-28 02:08:54,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3305046.6666666665, ans=0.2 2023-11-28 02:08:58,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3305046.6666666665, ans=0.2 2023-11-28 02:09:09,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.832e+01 9.441e+01 1.011e+02 1.289e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 02:09:10,583 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:09:10,875 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:09:17,085 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2800, loss[loss=0.06116, simple_loss=0.07652, pruned_loss=0.01124, audio_tagging_loss=0.01166, over 15461.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08916, pruned_loss=0.01231, audio_tagging_loss=0.008648, over 3032977.33 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:09:31,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3305246.6666666665, ans=0.2 2023-11-28 02:09:34,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3305246.6666666665, ans=0.0 2023-11-28 02:09:39,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3305246.6666666665, ans=0.125 2023-11-28 02:09:42,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-28 02:09:48,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-28 02:09:54,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2023-11-28 02:10:00,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3305380.0, ans=0.125 2023-11-28 02:10:09,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2023-11-28 02:10:14,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3305513.3333333335, ans=0.1 2023-11-28 02:10:15,075 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2850, loss[loss=0.06459, simple_loss=0.08044, pruned_loss=0.0129, audio_tagging_loss=0.01148, over 16503.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08887, pruned_loss=0.01226, audio_tagging_loss=0.008645, over 3033868.27 frames. ], batch size: 64, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:10:17,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3305513.3333333335, ans=0.0 2023-11-28 02:10:19,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2023-11-28 02:10:25,266 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:10:26,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3305580.0, ans=0.2 2023-11-28 02:10:26,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=22.5 2023-11-28 02:10:27,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3305580.0, ans=0.2 2023-11-28 02:10:31,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-28 02:10:38,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3305646.6666666665, ans=0.125 2023-11-28 02:10:39,399 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-28 02:11:04,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3305780.0, ans=0.125 2023-11-28 02:11:04,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.806e+01 9.389e+01 1.005e+02 1.417e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:11:12,654 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2900, loss[loss=0.06192, simple_loss=0.07883, pruned_loss=0.01465, audio_tagging_loss=0.007855, over 15927.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08913, pruned_loss=0.0123, audio_tagging_loss=0.008648, over 3041414.52 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:11:13,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3305846.6666666665, ans=0.125 2023-11-28 02:11:27,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3305913.3333333335, ans=0.0 2023-11-28 02:11:36,724 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-28 02:11:40,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3305980.0, ans=0.0 2023-11-28 02:11:47,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3306046.6666666665, ans=0.125 2023-11-28 02:11:58,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2023-11-28 02:12:09,959 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2950, loss[loss=0.06618, simple_loss=0.09223, pruned_loss=0.0115, audio_tagging_loss=0.008571, over 15617.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08954, pruned_loss=0.01229, audio_tagging_loss=0.008727, over 3043956.11 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:12:22,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3306246.6666666665, ans=0.1 2023-11-28 02:12:23,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-28 02:12:34,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-28 02:12:51,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3306380.0, ans=0.125 2023-11-28 02:13:01,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.902e+01 9.555e+01 1.025e+02 1.277e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 02:13:07,695 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3000, loss[loss=0.05415, simple_loss=0.07289, pruned_loss=0.00792, audio_tagging_loss=0.009781, over 14833.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08966, pruned_loss=0.01242, audio_tagging_loss=0.008791, over 3041711.41 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:13:07,696 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 02:13:42,049 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05767, simple_loss=0.05061, pruned_loss=0.005183, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 02:13:42,050 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 02:13:57,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3306580.0, ans=0.2 2023-11-28 02:14:05,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-28 02:14:05,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-28 02:14:07,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3306646.6666666665, ans=0.2 2023-11-28 02:14:14,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-28 02:14:42,113 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3050, loss[loss=0.06849, simple_loss=0.08782, pruned_loss=0.01497, audio_tagging_loss=0.009617, over 15215.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09003, pruned_loss=0.01254, audio_tagging_loss=0.008931, over 3041148.66 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:14:44,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3306846.6666666665, ans=0.125 2023-11-28 02:15:05,662 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-28 02:15:08,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3306980.0, ans=0.125 2023-11-28 02:15:09,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3306980.0, ans=0.0 2023-11-28 02:15:16,126 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:15:28,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3307113.3333333335, ans=0.125 2023-11-28 02:15:32,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.076e+01 8.885e+01 9.431e+01 1.019e+02 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 02:15:39,158 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3100, loss[loss=0.07145, simple_loss=0.1016, pruned_loss=0.01133, audio_tagging_loss=0.009325, over 15215.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09002, pruned_loss=0.01254, audio_tagging_loss=0.008916, over 3040542.27 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:16:03,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-28 02:16:30,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3307446.6666666665, ans=0.125 2023-11-28 02:16:36,622 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3150, loss[loss=0.06854, simple_loss=0.09403, pruned_loss=0.01126, audio_tagging_loss=0.01026, over 14501.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09086, pruned_loss=0.01247, audio_tagging_loss=0.008942, over 3047133.89 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:16:44,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3307513.3333333335, ans=0.125 2023-11-28 02:17:01,125 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-28 02:17:15,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3307713.3333333335, ans=0.0 2023-11-28 02:17:22,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-11-28 02:17:27,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.803e+01 9.436e+01 1.005e+02 1.293e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:17:33,935 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3200, loss[loss=0.06781, simple_loss=0.08352, pruned_loss=0.01639, audio_tagging_loss=0.009658, over 14323.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09118, pruned_loss=0.01248, audio_tagging_loss=0.008906, over 3041028.84 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:17:39,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3307846.6666666665, ans=0.125 2023-11-28 02:17:45,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3307913.3333333335, ans=0.05 2023-11-28 02:17:58,645 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-28 02:18:13,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:18,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:19,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3308113.3333333335, ans=0.125 2023-11-28 02:18:23,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3308113.3333333335, ans=0.0 2023-11-28 02:18:26,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3308113.3333333335, ans=0.0 2023-11-28 02:18:31,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3308180.0, ans=0.1 2023-11-28 02:18:32,209 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3250, loss[loss=0.06029, simple_loss=0.08115, pruned_loss=0.01059, audio_tagging_loss=0.009125, over 14459.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09102, pruned_loss=0.01255, audio_tagging_loss=0.008964, over 3034353.03 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:18:35,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3308180.0, ans=0.1 2023-11-28 02:18:56,699 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-28 02:19:13,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3308380.0, ans=0.09899494936611666 2023-11-28 02:19:23,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.753e+01 9.382e+01 9.909e+01 1.200e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 02:19:29,809 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3300, loss[loss=0.0689, simple_loss=0.09185, pruned_loss=0.01338, audio_tagging_loss=0.00959, over 15182.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09037, pruned_loss=0.01244, audio_tagging_loss=0.009085, over 3043347.71 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:19:54,725 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-28 02:20:13,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2023-11-28 02:20:14,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3308713.3333333335, ans=0.125 2023-11-28 02:20:27,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3308846.6666666665, ans=0.5 2023-11-28 02:20:28,041 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3350, loss[loss=0.04889, simple_loss=0.06186, pruned_loss=0.004778, audio_tagging_loss=0.01318, over 15012.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08911, pruned_loss=0.01216, audio_tagging_loss=0.009003, over 3040346.83 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:20:38,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3308913.3333333335, ans=0.0 2023-11-28 02:20:42,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3308913.3333333335, ans=0.125 2023-11-28 02:20:52,523 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-28 02:21:19,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.880e+01 9.434e+01 1.005e+02 1.295e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:21:25,798 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3400, loss[loss=0.07016, simple_loss=0.09876, pruned_loss=0.01226, audio_tagging_loss=0.008523, over 15994.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09073, pruned_loss=0.01238, audio_tagging_loss=0.008859, over 3042011.01 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:21:28,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2023-11-28 02:21:31,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2023-11-28 02:21:35,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3309180.0, ans=0.0 2023-11-28 02:21:49,494 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-28 02:22:03,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3309380.0, ans=0.125 2023-11-28 02:22:04,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3309380.0, ans=0.05 2023-11-28 02:22:14,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3309446.6666666665, ans=0.2 2023-11-28 02:22:23,522 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3450, loss[loss=0.06482, simple_loss=0.08248, pruned_loss=0.01176, audio_tagging_loss=0.01182, over 14149.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08999, pruned_loss=0.01225, audio_tagging_loss=0.008791, over 3036233.09 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:22:28,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309513.3333333335, ans=0.1 2023-11-28 02:22:30,292 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:22:35,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3309580.0, ans=0.125 2023-11-28 02:22:48,151 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-28 02:22:59,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3309713.3333333335, ans=0.125 2023-11-28 02:23:04,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3309713.3333333335, ans=0.125 2023-11-28 02:23:13,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.800e+01 9.317e+01 1.017e+02 1.307e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 02:23:16,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3309780.0, ans=0.2 2023-11-28 02:23:20,396 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3500, loss[loss=0.06465, simple_loss=0.08411, pruned_loss=0.0112, audio_tagging_loss=0.0114, over 15604.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08997, pruned_loss=0.01223, audio_tagging_loss=0.008694, over 3037623.81 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:23:36,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-11-28 02:23:44,973 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-28 02:23:52,029 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:24:18,474 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3550, loss[loss=0.06769, simple_loss=0.09272, pruned_loss=0.01232, audio_tagging_loss=0.009007, over 14855.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08949, pruned_loss=0.01224, audio_tagging_loss=0.008645, over 3044636.31 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:24:18,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3310180.0, ans=0.125 2023-11-28 02:24:23,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3310180.0, ans=0.1 2023-11-28 02:24:25,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=22.5 2023-11-28 02:24:27,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3310180.0, ans=0.125 2023-11-28 02:24:42,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-28 02:24:47,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-28 02:24:55,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3310380.0, ans=0.0 2023-11-28 02:25:08,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.721e+01 9.297e+01 1.018e+02 1.196e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:25:15,739 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3600, loss[loss=0.07445, simple_loss=0.0983, pruned_loss=0.01675, audio_tagging_loss=0.008547, over 14898.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08951, pruned_loss=0.01232, audio_tagging_loss=0.008676, over 3056617.78 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:25:39,236 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-28 02:25:50,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3310713.3333333335, ans=0.125 2023-11-28 02:25:57,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=15.0 2023-11-28 02:25:59,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3310713.3333333335, ans=0.1 2023-11-28 02:26:12,331 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3650, loss[loss=0.07811, simple_loss=0.1025, pruned_loss=0.01821, audio_tagging_loss=0.008627, over 15159.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09043, pruned_loss=0.01258, audio_tagging_loss=0.008583, over 3060595.08 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:26:18,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-28 02:26:25,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3310913.3333333335, ans=0.1 2023-11-28 02:26:33,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3310913.3333333335, ans=0.125 2023-11-28 02:26:36,643 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-28 02:26:39,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-28 02:26:54,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-28 02:26:56,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2023-11-28 02:26:57,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-28 02:27:03,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.594e+01 9.332e+01 1.003e+02 1.328e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 02:27:06,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3311113.3333333335, ans=0.125 2023-11-28 02:27:09,747 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3700, loss[loss=0.05859, simple_loss=0.08707, pruned_loss=0.008975, audio_tagging_loss=0.006077, over 16280.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09008, pruned_loss=0.01254, audio_tagging_loss=0.008616, over 3059719.10 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:27:18,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2023-11-28 02:27:20,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3311246.6666666665, ans=0.2 2023-11-28 02:27:33,991 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-28 02:27:35,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3311313.3333333335, ans=0.125 2023-11-28 02:27:58,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3311446.6666666665, ans=0.0 2023-11-28 02:28:03,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=12.0 2023-11-28 02:28:07,637 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3750, loss[loss=0.07491, simple_loss=0.09979, pruned_loss=0.01305, audio_tagging_loss=0.01198, over 15983.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0909, pruned_loss=0.01264, audio_tagging_loss=0.008624, over 3059259.42 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:28:13,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-28 02:28:23,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3311580.0, ans=0.0 2023-11-28 02:28:23,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3311580.0, ans=0.025 2023-11-28 02:28:30,769 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-28 02:28:32,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2023-11-28 02:28:47,234 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:28:58,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.781e+01 9.240e+01 9.958e+01 1.596e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 02:29:04,356 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3800, loss[loss=0.06629, simple_loss=0.09564, pruned_loss=0.009938, audio_tagging_loss=0.008528, over 15346.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09119, pruned_loss=0.01262, audio_tagging_loss=0.008598, over 3059956.35 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:29:04,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2023-11-28 02:29:05,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3311846.6666666665, ans=0.125 2023-11-28 02:29:17,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2023-11-28 02:29:22,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311913.3333333335, ans=0.1 2023-11-28 02:29:28,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-28 02:29:36,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3311980.0, ans=0.125 2023-11-28 02:29:51,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-28 02:29:55,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=22.5 2023-11-28 02:29:56,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312113.3333333335, ans=0.1 2023-11-28 02:30:01,646 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3850, loss[loss=0.08917, simple_loss=0.1155, pruned_loss=0.0216, audio_tagging_loss=0.009814, over 15171.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09148, pruned_loss=0.01264, audio_tagging_loss=0.008637, over 3054565.51 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:30:01,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3312180.0, ans=0.1 2023-11-28 02:30:06,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2023-11-28 02:30:12,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3312246.6666666665, ans=0.0 2023-11-28 02:30:26,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-28 02:30:29,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3312313.3333333335, ans=0.0 2023-11-28 02:30:31,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3312313.3333333335, ans=0.0 2023-11-28 02:30:53,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.894e+01 9.500e+01 1.019e+02 1.780e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 02:30:59,259 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3900, loss[loss=0.05797, simple_loss=0.07935, pruned_loss=0.009947, audio_tagging_loss=0.008346, over 15641.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09155, pruned_loss=0.01264, audio_tagging_loss=0.008692, over 3061134.80 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:30:59,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3312513.3333333335, ans=0.2 2023-11-28 02:31:22,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-28 02:31:34,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3312713.3333333335, ans=0.0 2023-11-28 02:31:34,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3312713.3333333335, ans=0.07 2023-11-28 02:31:56,452 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3950, loss[loss=0.0648, simple_loss=0.08497, pruned_loss=0.01445, audio_tagging_loss=0.007863, over 16090.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09153, pruned_loss=0.01269, audio_tagging_loss=0.008772, over 3069201.95 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:32:06,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312913.3333333335, ans=0.1 2023-11-28 02:32:07,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3312913.3333333335, ans=10.0 2023-11-28 02:32:11,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-28 02:32:13,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=15.0 2023-11-28 02:32:19,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-28 02:32:19,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:23,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:28,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:39,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-11-28 02:32:46,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.841e+01 9.484e+01 1.039e+02 1.407e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:32:52,895 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4000, loss[loss=0.05341, simple_loss=0.07067, pruned_loss=0.01017, audio_tagging_loss=0.007903, over 15296.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09122, pruned_loss=0.01267, audio_tagging_loss=0.008894, over 3064139.21 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:32:55,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3313180.0, ans=0.125 2023-11-28 02:33:10,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3313246.6666666665, ans=0.2 2023-11-28 02:33:11,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3313246.6666666665, ans=0.125 2023-11-28 02:33:17,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-28 02:33:23,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3313313.3333333335, ans=0.125 2023-11-28 02:33:26,615 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:33:43,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-11-28 02:33:45,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3313446.6666666665, ans=0.035 2023-11-28 02:33:49,966 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4050, loss[loss=0.08186, simple_loss=0.1163, pruned_loss=0.01527, audio_tagging_loss=0.008444, over 14501.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09093, pruned_loss=0.01258, audio_tagging_loss=0.008976, over 3065957.57 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:33:53,185 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:33:56,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3313513.3333333335, ans=0.125 2023-11-28 02:33:57,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3313513.3333333335, ans=0.0 2023-11-28 02:34:01,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3313580.0, ans=0.2 2023-11-28 02:34:02,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-11-28 02:34:10,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 02:34:14,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-28 02:34:33,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3313713.3333333335, ans=0.125 2023-11-28 02:34:40,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3313780.0, ans=0.0 2023-11-28 02:34:42,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.798e+01 9.309e+01 1.006e+02 1.878e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 02:34:47,161 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4100, loss[loss=0.06543, simple_loss=0.08695, pruned_loss=0.01396, audio_tagging_loss=0.007996, over 14503.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09115, pruned_loss=0.01257, audio_tagging_loss=0.008948, over 3063943.40 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:34:55,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3313846.6666666665, ans=0.125 2023-11-28 02:35:05,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3313913.3333333335, ans=0.125 2023-11-28 02:35:10,940 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-28 02:35:20,994 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:35:21,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3314046.6666666665, ans=0.0 2023-11-28 02:35:41,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3314113.3333333335, ans=0.07 2023-11-28 02:35:43,995 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4150, loss[loss=0.06886, simple_loss=0.09941, pruned_loss=0.01094, audio_tagging_loss=0.008217, over 16154.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0909, pruned_loss=0.01251, audio_tagging_loss=0.008885, over 3054250.72 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:36:08,928 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-28 02:36:23,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3314380.0, ans=0.125 2023-11-28 02:36:26,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-28 02:36:27,019 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:36:29,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3314446.6666666665, ans=0.0 2023-11-28 02:36:31,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3314446.6666666665, ans=0.2 2023-11-28 02:36:31,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.36 vs. limit=22.5 2023-11-28 02:36:37,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.734e+01 9.392e+01 9.837e+01 1.224e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:36:41,683 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4200, loss[loss=0.07378, simple_loss=0.09495, pruned_loss=0.01837, audio_tagging_loss=0.007935, over 14811.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09152, pruned_loss=0.01249, audio_tagging_loss=0.008641, over 3054377.22 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:37:06,191 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-28 02:37:40,171 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4250, loss[loss=0.0563, simple_loss=0.07604, pruned_loss=0.008, audio_tagging_loss=0.01028, over 13890.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.0909, pruned_loss=0.01262, audio_tagging_loss=0.008626, over 3052731.50 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:37:49,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3314846.6666666665, ans=0.0 2023-11-28 02:37:50,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-28 02:37:50,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=22.5 2023-11-28 02:37:54,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3314913.3333333335, ans=0.125 2023-11-28 02:38:03,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3314980.0, ans=0.125 2023-11-28 02:38:04,123 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-28 02:38:31,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315113.3333333335, ans=0.1 2023-11-28 02:38:32,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.722e+01 9.477e+01 1.017e+02 1.335e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 02:38:36,918 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4300, loss[loss=0.06497, simple_loss=0.08885, pruned_loss=0.01076, audio_tagging_loss=0.009782, over 16010.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09105, pruned_loss=0.01267, audio_tagging_loss=0.008647, over 3046625.82 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:38:42,644 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:38:53,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-28 02:38:58,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3315313.3333333335, ans=0.07 2023-11-28 02:38:59,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-11-28 02:38:59,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-28 02:39:01,034 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-28 02:39:29,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3315446.6666666665, ans=0.125 2023-11-28 02:39:33,972 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4350, loss[loss=0.06488, simple_loss=0.0804, pruned_loss=0.01425, audio_tagging_loss=0.01042, over 16458.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09124, pruned_loss=0.01269, audio_tagging_loss=0.008551, over 3052410.73 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:39:36,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3315513.3333333335, ans=0.125 2023-11-28 02:39:39,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3315513.3333333335, ans=0.2 2023-11-28 02:39:46,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-28 02:39:58,394 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-28 02:39:59,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2023-11-28 02:40:21,040 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:40:26,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.957e+01 9.552e+01 1.043e+02 1.269e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 02:40:26,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3315780.0, ans=0.2 2023-11-28 02:40:31,054 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4400, loss[loss=0.07203, simple_loss=0.09615, pruned_loss=0.01638, audio_tagging_loss=0.007571, over 16365.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09141, pruned_loss=0.01267, audio_tagging_loss=0.008539, over 3045402.64 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:40:41,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3315913.3333333335, ans=0.2 2023-11-28 02:40:41,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-28 02:40:55,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-28 02:41:05,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3316046.6666666665, ans=0.1 2023-11-28 02:41:17,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-11-28 02:41:20,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3316113.3333333335, ans=0.09899494936611666 2023-11-28 02:41:28,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3316180.0, ans=0.2 2023-11-28 02:41:29,272 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4450, loss[loss=0.06974, simple_loss=0.09549, pruned_loss=0.01278, audio_tagging_loss=0.009221, over 15254.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09053, pruned_loss=0.01255, audio_tagging_loss=0.008511, over 3036542.69 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:41:45,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3316246.6666666665, ans=0.0 2023-11-28 02:41:53,486 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-28 02:41:56,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2023-11-28 02:42:04,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3316380.0, ans=0.125 2023-11-28 02:42:05,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3316380.0, ans=0.0 2023-11-28 02:42:22,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.078e+01 9.731e+01 1.036e+02 1.394e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 02:42:27,247 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4500, loss[loss=0.05366, simple_loss=0.07715, pruned_loss=0.00756, audio_tagging_loss=0.007523, over 14541.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09078, pruned_loss=0.01263, audio_tagging_loss=0.00847, over 3041579.40 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:42:34,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2023-11-28 02:42:49,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3316646.6666666665, ans=0.0 2023-11-28 02:42:50,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-28 02:42:55,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3316646.6666666665, ans=0.125 2023-11-28 02:43:24,680 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4550, loss[loss=0.05587, simple_loss=0.07454, pruned_loss=0.009309, audio_tagging_loss=0.009291, over 15552.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0904, pruned_loss=0.0125, audio_tagging_loss=0.008513, over 3044365.59 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:43:26,005 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:43:32,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316846.6666666665, ans=0.1 2023-11-28 02:43:34,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3316913.3333333335, ans=0.125 2023-11-28 02:43:35,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-11-28 02:43:41,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3316913.3333333335, ans=0.125 2023-11-28 02:43:49,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-28 02:43:56,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3316980.0, ans=0.125 2023-11-28 02:44:07,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3317046.6666666665, ans=0.0 2023-11-28 02:44:08,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3317046.6666666665, ans=0.0 2023-11-28 02:44:09,501 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:44:15,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3317113.3333333335, ans=0.125 2023-11-28 02:44:18,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.674e+01 9.170e+01 9.991e+01 1.281e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-28 02:44:21,528 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4600, loss[loss=0.07163, simple_loss=0.09346, pruned_loss=0.01659, audio_tagging_loss=0.008304, over 15664.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09023, pruned_loss=0.01246, audio_tagging_loss=0.008561, over 3034706.54 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:44:25,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3317180.0, ans=0.1 2023-11-28 02:44:46,310 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-28 02:44:49,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3317313.3333333335, ans=0.125 2023-11-28 02:44:56,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3317380.0, ans=15.0 2023-11-28 02:45:01,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3317380.0, ans=10.0 2023-11-28 02:45:18,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3317446.6666666665, ans=0.125 2023-11-28 02:45:18,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 02:45:20,488 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4650, loss[loss=0.06874, simple_loss=0.08199, pruned_loss=0.01765, audio_tagging_loss=0.0101, over 14510.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09009, pruned_loss=0.01241, audio_tagging_loss=0.008676, over 3034141.93 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:45:26,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3317513.3333333335, ans=0.2 2023-11-28 02:45:44,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-28 02:46:14,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.773e+01 9.249e+01 1.003e+02 1.204e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 02:46:17,629 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4700, loss[loss=0.07034, simple_loss=0.09047, pruned_loss=0.01396, audio_tagging_loss=0.01115, over 13994.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08987, pruned_loss=0.01245, audio_tagging_loss=0.008766, over 3036150.41 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:46:28,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3317913.3333333335, ans=0.1 2023-11-28 02:46:35,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3317913.3333333335, ans=0.125 2023-11-28 02:46:42,346 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-28 02:46:43,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3317980.0, ans=0.5 2023-11-28 02:47:00,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3318046.6666666665, ans=0.125 2023-11-28 02:47:14,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2023-11-28 02:47:14,943 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4750, loss[loss=0.06349, simple_loss=0.08705, pruned_loss=0.01056, audio_tagging_loss=0.009405, over 16215.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08902, pruned_loss=0.01232, audio_tagging_loss=0.008969, over 3033026.46 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:47:15,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3318180.0, ans=0.0 2023-11-28 02:47:16,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3318180.0, ans=0.0 2023-11-28 02:47:19,509 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:47:39,362 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-28 02:47:57,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3318380.0, ans=0.125 2023-11-28 02:48:08,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.846e+01 9.343e+01 1.002e+02 1.233e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:48:13,296 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4800, loss[loss=0.07996, simple_loss=0.1138, pruned_loss=0.01442, audio_tagging_loss=0.008639, over 15565.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0898, pruned_loss=0.01241, audio_tagging_loss=0.009011, over 3039692.44 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:48:14,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3318513.3333333335, ans=0.125 2023-11-28 02:48:19,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3318513.3333333335, ans=0.0 2023-11-28 02:48:37,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-28 02:48:42,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3318646.6666666665, ans=22.5 2023-11-28 02:48:43,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3318646.6666666665, ans=0.125 2023-11-28 02:48:47,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3318713.3333333335, ans=0.0 2023-11-28 02:48:57,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3318713.3333333335, ans=0.2 2023-11-28 02:49:10,418 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4850, loss[loss=0.07596, simple_loss=0.104, pruned_loss=0.01456, audio_tagging_loss=0.00938, over 15143.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0905, pruned_loss=0.01255, audio_tagging_loss=0.009012, over 3038202.40 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:49:34,154 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-28 02:49:56,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319113.3333333335, ans=0.1 2023-11-28 02:49:57,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3319113.3333333335, ans=0.125 2023-11-28 02:50:03,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3319113.3333333335, ans=0.125 2023-11-28 02:50:05,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3319113.3333333335, ans=0.125 2023-11-28 02:50:05,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.681e+01 9.347e+01 1.000e+02 1.245e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:50:08,170 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4900, loss[loss=0.06243, simple_loss=0.09113, pruned_loss=0.01121, audio_tagging_loss=0.005656, over 15173.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09074, pruned_loss=0.01245, audio_tagging_loss=0.008919, over 3042077.28 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:50:33,066 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-28 02:51:05,901 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4950, loss[loss=0.08148, simple_loss=0.1075, pruned_loss=0.02049, audio_tagging_loss=0.007258, over 15612.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09025, pruned_loss=0.01243, audio_tagging_loss=0.008855, over 3036847.00 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:51:09,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-28 02:51:25,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-28 02:51:30,991 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-28 02:51:50,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3319713.3333333335, ans=0.2 2023-11-28 02:51:56,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3319780.0, ans=0.125 2023-11-28 02:52:01,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.571e+01 9.206e+01 9.727e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 02:52:04,052 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5000, loss[loss=0.08147, simple_loss=0.1139, pruned_loss=0.01676, audio_tagging_loss=0.007765, over 14781.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09018, pruned_loss=0.0124, audio_tagging_loss=0.008686, over 3038484.34 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:52:11,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3319846.6666666665, ans=0.0 2023-11-28 02:52:13,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3319846.6666666665, ans=0.125 2023-11-28 02:52:27,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-28 02:52:49,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3320113.3333333335, ans=0.125 2023-11-28 02:52:50,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-28 02:52:56,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3320113.3333333335, ans=0.125 2023-11-28 02:52:59,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3320113.3333333335, ans=0.1 2023-11-28 02:53:01,675 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5050, loss[loss=0.06415, simple_loss=0.09554, pruned_loss=0.01016, audio_tagging_loss=0.006221, over 16575.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09099, pruned_loss=0.01237, audio_tagging_loss=0.008653, over 3039399.48 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:53:07,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3320180.0, ans=0.0 2023-11-28 02:53:18,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-28 02:53:24,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-28 02:53:25,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-28 02:53:39,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3320380.0, ans=0.2 2023-11-28 02:53:53,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2023-11-28 02:53:56,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.785e+01 9.412e+01 9.952e+01 1.191e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:53:56,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3320446.6666666665, ans=0.125 2023-11-28 02:53:58,587 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5100, loss[loss=0.06567, simple_loss=0.08535, pruned_loss=0.01197, audio_tagging_loss=0.01103, over 15246.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08972, pruned_loss=0.01228, audio_tagging_loss=0.008602, over 3040351.19 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:54:00,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3320513.3333333335, ans=0.125 2023-11-28 02:54:02,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3320513.3333333335, ans=0.125 2023-11-28 02:54:22,941 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:54:23,932 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-28 02:54:38,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3320713.3333333335, ans=0.0 2023-11-28 02:54:39,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3320713.3333333335, ans=0.05 2023-11-28 02:54:44,499 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:54:56,888 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5150, loss[loss=0.05797, simple_loss=0.07893, pruned_loss=0.009749, audio_tagging_loss=0.008761, over 15665.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08977, pruned_loss=0.01237, audio_tagging_loss=0.008553, over 3045974.04 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:55:18,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3320913.3333333335, ans=0.95 2023-11-28 02:55:19,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3320980.0, ans=0.125 2023-11-28 02:55:20,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3320980.0, ans=0.125 2023-11-28 02:55:21,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-28 02:55:38,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3321046.6666666665, ans=0.025 2023-11-28 02:55:53,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.740e+01 9.410e+01 1.002e+02 1.466e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:55:54,468 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5200, loss[loss=0.076, simple_loss=0.1041, pruned_loss=0.01627, audio_tagging_loss=0.007665, over 15061.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08934, pruned_loss=0.01219, audio_tagging_loss=0.008565, over 3050252.02 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:56:18,528 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-28 02:56:22,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3321313.3333333335, ans=0.2 2023-11-28 02:56:29,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3321380.0, ans=0.025 2023-11-28 02:56:45,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-11-28 02:56:51,813 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5250, loss[loss=0.048, simple_loss=0.05858, pruned_loss=0.00737, audio_tagging_loss=0.01134, over 15934.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08969, pruned_loss=0.01222, audio_tagging_loss=0.008537, over 3052884.88 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:56:54,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3321513.3333333335, ans=0.2 2023-11-28 02:57:00,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3321513.3333333335, ans=0.125 2023-11-28 02:57:02,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.19 vs. limit=22.5 2023-11-28 02:57:09,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-28 02:57:10,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321580.0, ans=0.1 2023-11-28 02:57:11,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3321580.0, ans=0.0 2023-11-28 02:57:15,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3321646.6666666665, ans=0.0 2023-11-28 02:57:16,169 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-28 02:57:16,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:20,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:24,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:48,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.904e+01 9.487e+01 1.032e+02 1.355e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:57:49,441 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5300, loss[loss=0.06872, simple_loss=0.09409, pruned_loss=0.01112, audio_tagging_loss=0.01056, over 14562.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09034, pruned_loss=0.01245, audio_tagging_loss=0.008495, over 3052189.13 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:58:08,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3321913.3333333335, ans=0.0 2023-11-28 02:58:13,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-11-28 02:58:13,560 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-28 02:58:42,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3322113.3333333335, ans=0.125 2023-11-28 02:58:47,135 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5350, loss[loss=0.06073, simple_loss=0.07493, pruned_loss=0.0106, audio_tagging_loss=0.01266, over 16015.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0903, pruned_loss=0.01247, audio_tagging_loss=0.00858, over 3050947.10 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:58:48,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3322180.0, ans=0.125 2023-11-28 02:59:11,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-28 02:59:24,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322380.0, ans=0.1 2023-11-28 02:59:34,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3322446.6666666665, ans=0.125 2023-11-28 02:59:42,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.664e+01 9.180e+01 9.721e+01 1.287e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-28 02:59:44,032 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5400, loss[loss=0.05511, simple_loss=0.07414, pruned_loss=0.009615, audio_tagging_loss=0.008423, over 15792.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08952, pruned_loss=0.01218, audio_tagging_loss=0.008572, over 3046643.23 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:59:46,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-28 03:00:08,042 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-28 03:00:09,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3322646.6666666665, ans=0.0 2023-11-28 03:00:14,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3322646.6666666665, ans=0.1 2023-11-28 03:00:15,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3322646.6666666665, ans=0.0 2023-11-28 03:00:19,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3322713.3333333335, ans=0.09899494936611666 2023-11-28 03:00:40,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3322780.0, ans=0.125 2023-11-28 03:00:42,007 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5450, loss[loss=0.06946, simple_loss=0.09805, pruned_loss=0.01147, audio_tagging_loss=0.008966, over 14433.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09036, pruned_loss=0.01226, audio_tagging_loss=0.008693, over 3050696.19 frames. ], batch size: 52, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:00:47,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2023-11-28 03:00:48,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3322846.6666666665, ans=0.5 2023-11-28 03:00:49,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3322846.6666666665, ans=0.125 2023-11-28 03:00:54,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3322913.3333333335, ans=0.125 2023-11-28 03:01:02,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.59 vs. limit=22.5 2023-11-28 03:01:06,726 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-28 03:01:16,864 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:01:31,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3323113.3333333335, ans=0.1 2023-11-28 03:01:38,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.881e+01 9.599e+01 1.024e+02 1.269e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 03:01:39,545 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5500, loss[loss=0.06322, simple_loss=0.08583, pruned_loss=0.0106, audio_tagging_loss=0.009715, over 14506.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09081, pruned_loss=0.01242, audio_tagging_loss=0.008738, over 3046407.45 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:01:58,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323246.6666666665, ans=0.1 2023-11-28 03:02:04,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-28 03:02:05,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3323313.3333333335, ans=0.125 2023-11-28 03:02:18,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3323380.0, ans=0.125 2023-11-28 03:02:37,295 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5550, loss[loss=0.0767, simple_loss=0.1121, pruned_loss=0.01293, audio_tagging_loss=0.007719, over 15554.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09105, pruned_loss=0.01242, audio_tagging_loss=0.00882, over 3055295.17 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:02:39,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2023-11-28 03:02:54,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-28 03:02:57,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3323580.0, ans=0.05 2023-11-28 03:03:01,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-28 03:03:02,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323646.6666666665, ans=0.1 2023-11-28 03:03:14,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323713.3333333335, ans=0.1 2023-11-28 03:03:15,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3323713.3333333335, ans=0.125 2023-11-28 03:03:33,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.559e+01 9.219e+01 9.829e+01 1.565e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 03:03:34,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323846.6666666665, ans=0.1 2023-11-28 03:03:35,076 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5600, loss[loss=0.0607, simple_loss=0.08428, pruned_loss=0.008469, audio_tagging_loss=0.01009, over 16621.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09151, pruned_loss=0.01239, audio_tagging_loss=0.008924, over 3058502.55 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:03:59,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-28 03:04:03,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-11-28 03:04:04,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3323980.0, ans=0.0 2023-11-28 03:04:17,625 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:04:31,805 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5650, loss[loss=0.06704, simple_loss=0.0861, pruned_loss=0.01551, audio_tagging_loss=0.008477, over 15408.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09092, pruned_loss=0.01255, audio_tagging_loss=0.008988, over 3061688.26 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:04:37,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-11-28 03:04:40,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3324180.0, ans=0.125 2023-11-28 03:04:47,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3324246.6666666665, ans=0.2 2023-11-28 03:04:55,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-28 03:05:11,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3324380.0, ans=0.125 2023-11-28 03:05:25,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3324446.6666666665, ans=0.0 2023-11-28 03:05:28,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.782e+01 9.473e+01 1.042e+02 1.222e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 03:05:29,873 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5700, loss[loss=0.07485, simple_loss=0.1036, pruned_loss=0.01407, audio_tagging_loss=0.008987, over 15920.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09137, pruned_loss=0.01259, audio_tagging_loss=0.008899, over 3061005.70 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:05:31,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3324513.3333333335, ans=15.0 2023-11-28 03:05:36,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=22.5 2023-11-28 03:05:40,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3324580.0, ans=0.5 2023-11-28 03:05:49,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3324580.0, ans=0.2 2023-11-28 03:05:53,876 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-28 03:05:59,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3324646.6666666665, ans=0.125 2023-11-28 03:06:09,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3324713.3333333335, ans=0.0 2023-11-28 03:06:20,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3324780.0, ans=0.125 2023-11-28 03:06:26,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3324846.6666666665, ans=0.09899494936611666 2023-11-28 03:06:27,541 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5750, loss[loss=0.06327, simple_loss=0.08678, pruned_loss=0.009446, audio_tagging_loss=0.01043, over 13725.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09125, pruned_loss=0.01251, audio_tagging_loss=0.008826, over 3056174.96 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:06:30,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3324846.6666666665, ans=0.0 2023-11-28 03:06:44,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3324913.3333333335, ans=0.1 2023-11-28 03:06:44,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=12.0 2023-11-28 03:06:50,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-28 03:07:22,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.740e+01 9.291e+01 9.936e+01 1.231e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:07:23,844 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5800, loss[loss=0.08519, simple_loss=0.1183, pruned_loss=0.01721, audio_tagging_loss=0.008812, over 15582.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09158, pruned_loss=0.01261, audio_tagging_loss=0.008717, over 3046355.26 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:07:38,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3325246.6666666665, ans=0.125 2023-11-28 03:07:41,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3325246.6666666665, ans=0.2 2023-11-28 03:07:43,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3325246.6666666665, ans=0.0 2023-11-28 03:07:48,071 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-28 03:07:59,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3325380.0, ans=0.0 2023-11-28 03:08:08,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3325380.0, ans=0.125 2023-11-28 03:08:12,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3325446.6666666665, ans=0.5 2023-11-28 03:08:21,650 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5850, loss[loss=0.05747, simple_loss=0.07329, pruned_loss=0.009177, audio_tagging_loss=0.01164, over 14753.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09109, pruned_loss=0.01259, audio_tagging_loss=0.008652, over 3048920.37 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:08:24,048 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:08:46,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-28 03:09:10,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3325780.0, ans=0.05 2023-11-28 03:09:18,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.725e+01 9.320e+01 1.016e+02 1.515e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:09:18,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3325846.6666666665, ans=0.0 2023-11-28 03:09:19,658 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5900, loss[loss=0.08133, simple_loss=0.1155, pruned_loss=0.01465, audio_tagging_loss=0.008933, over 15280.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09146, pruned_loss=0.01264, audio_tagging_loss=0.008578, over 3042049.56 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:09:25,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2023-11-28 03:09:37,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3325913.3333333335, ans=0.125 2023-11-28 03:09:43,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-28 03:09:48,410 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:09:49,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3325980.0, ans=0.0 2023-11-28 03:09:57,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3326046.6666666665, ans=0.125 2023-11-28 03:10:07,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-28 03:10:17,156 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5950, loss[loss=0.07459, simple_loss=0.1057, pruned_loss=0.01538, audio_tagging_loss=0.006371, over 16009.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0905, pruned_loss=0.01252, audio_tagging_loss=0.008611, over 3040080.53 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:10:31,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3326246.6666666665, ans=0.125 2023-11-28 03:10:38,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-28 03:10:40,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-28 03:10:56,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326380.0, ans=0.1 2023-11-28 03:10:59,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-11-28 03:11:03,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3326446.6666666665, ans=0.125 2023-11-28 03:11:14,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.682e+01 9.363e+01 1.001e+02 1.313e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:11:14,395 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6000, loss[loss=0.05781, simple_loss=0.07371, pruned_loss=0.01246, audio_tagging_loss=0.0085, over 14936.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09008, pruned_loss=0.01247, audio_tagging_loss=0.008633, over 3038507.76 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:11:14,395 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 03:11:33,041 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9700, 3.1249, 2.9209, 3.1590, 3.4117, 2.8161, 3.4224, 2.7300], device='cuda:3') 2023-11-28 03:11:49,933 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05789, simple_loss=0.05056, pruned_loss=0.005172, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 03:11:49,934 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 03:12:05,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326580.0, ans=0.1 2023-11-28 03:12:13,475 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-28 03:12:19,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3326646.6666666665, ans=0.125 2023-11-28 03:12:32,239 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:12:46,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3326846.6666666665, ans=0.125 2023-11-28 03:12:46,915 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6050, loss[loss=0.05741, simple_loss=0.07419, pruned_loss=0.009873, audio_tagging_loss=0.01045, over 15244.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.0901, pruned_loss=0.01241, audio_tagging_loss=0.008635, over 3033547.11 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:12:58,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326913.3333333335, ans=0.1 2023-11-28 03:13:10,388 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-28 03:13:18,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3326980.0, ans=0.125 2023-11-28 03:13:18,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3326980.0, ans=0.2 2023-11-28 03:13:18,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3326980.0, ans=0.0 2023-11-28 03:13:27,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3327046.6666666665, ans=0.125 2023-11-28 03:13:34,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2023-11-28 03:13:36,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3327113.3333333335, ans=0.0 2023-11-28 03:13:42,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-28 03:13:44,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.818e+01 9.290e+01 9.982e+01 1.282e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:13:44,266 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6100, loss[loss=0.0452, simple_loss=0.056, pruned_loss=0.007239, audio_tagging_loss=0.009957, over 15809.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08978, pruned_loss=0.01231, audio_tagging_loss=0.008619, over 3036225.23 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:13:48,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2023-11-28 03:14:01,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.29 vs. limit=12.0 2023-11-28 03:14:08,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-28 03:14:10,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3327313.3333333335, ans=0.125 2023-11-28 03:14:10,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=22.5 2023-11-28 03:14:20,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3327380.0, ans=0.125 2023-11-28 03:14:41,490 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6150, loss[loss=0.08111, simple_loss=0.1102, pruned_loss=0.01848, audio_tagging_loss=0.007542, over 15148.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09088, pruned_loss=0.0127, audio_tagging_loss=0.008598, over 3034813.54 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:06,118 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-28 03:15:27,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3327780.0, ans=0.125 2023-11-28 03:15:39,224 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6200, loss[loss=0.0678, simple_loss=0.09315, pruned_loss=0.01456, audio_tagging_loss=0.006669, over 14525.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09012, pruned_loss=0.0126, audio_tagging_loss=0.008653, over 3034995.21 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:40,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.163e+01 8.658e+01 9.318e+01 1.003e+02 1.390e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:15:44,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3327846.6666666665, ans=10.0 2023-11-28 03:16:02,947 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-28 03:16:36,596 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6250, loss[loss=0.03408, simple_loss=0.03917, pruned_loss=0.004838, audio_tagging_loss=0.009659, over 14860.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08895, pruned_loss=0.01238, audio_tagging_loss=0.008737, over 3037380.41 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:00,552 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-28 03:17:13,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3328380.0, ans=0.1 2023-11-28 03:17:19,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3328380.0, ans=0.0 2023-11-28 03:17:21,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3328446.6666666665, ans=0.2 2023-11-28 03:17:33,309 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6300, loss[loss=0.06718, simple_loss=0.08786, pruned_loss=0.01317, audio_tagging_loss=0.01009, over 14754.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.0893, pruned_loss=0.01236, audio_tagging_loss=0.008804, over 3038791.97 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:34,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 9.160e+01 9.772e+01 1.060e+02 1.350e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 03:17:52,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-28 03:17:58,594 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-28 03:18:02,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2023-11-28 03:18:14,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3328713.3333333335, ans=0.125 2023-11-28 03:18:15,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3328713.3333333335, ans=0.125 2023-11-28 03:18:15,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2023-11-28 03:18:19,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3328780.0, ans=0.125 2023-11-28 03:18:27,925 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:18:31,045 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6350, loss[loss=0.08035, simple_loss=0.1112, pruned_loss=0.01711, audio_tagging_loss=0.007653, over 15744.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08916, pruned_loss=0.01233, audio_tagging_loss=0.008858, over 3039526.71 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:18:43,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3328913.3333333335, ans=0.2 2023-11-28 03:18:48,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3328913.3333333335, ans=0.125 2023-11-28 03:18:52,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3328913.3333333335, ans=0.0 2023-11-28 03:18:55,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-28 03:19:07,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3329046.6666666665, ans=0.125 2023-11-28 03:19:12,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2023-11-28 03:19:29,067 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6400, loss[loss=0.06643, simple_loss=0.08963, pruned_loss=0.01007, audio_tagging_loss=0.01155, over 14840.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0889, pruned_loss=0.01225, audio_tagging_loss=0.009001, over 3034918.11 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:19:30,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.920e+01 9.509e+01 1.018e+02 1.569e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 03:19:52,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3329313.3333333335, ans=0.0 2023-11-28 03:19:52,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-28 03:19:52,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-28 03:20:19,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329446.6666666665, ans=0.1 2023-11-28 03:20:26,017 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6450, loss[loss=0.05751, simple_loss=0.0797, pruned_loss=0.01048, audio_tagging_loss=0.007186, over 16200.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08948, pruned_loss=0.01248, audio_tagging_loss=0.009051, over 3046194.64 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:20:34,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3329513.3333333335, ans=0.125 2023-11-28 03:20:49,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-28 03:20:59,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3329713.3333333335, ans=0.125 2023-11-28 03:21:02,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-28 03:21:04,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3329713.3333333335, ans=0.125 2023-11-28 03:21:09,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2023-11-28 03:21:23,068 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6500, loss[loss=0.0573, simple_loss=0.07985, pruned_loss=0.008226, audio_tagging_loss=0.009144, over 14970.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08913, pruned_loss=0.01243, audio_tagging_loss=0.009009, over 3052124.25 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:21:25,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.737e+01 9.352e+01 9.973e+01 1.217e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 03:21:27,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3329846.6666666665, ans=0.125 2023-11-28 03:21:47,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-28 03:21:51,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-28 03:22:05,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3330046.6666666665, ans=0.125 2023-11-28 03:22:15,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.00 vs. limit=22.5 2023-11-28 03:22:20,279 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6550, loss[loss=0.06177, simple_loss=0.0878, pruned_loss=0.008695, audio_tagging_loss=0.009179, over 15184.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08863, pruned_loss=0.01216, audio_tagging_loss=0.008864, over 3046870.36 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:22:38,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3330246.6666666665, ans=0.125 2023-11-28 03:22:44,203 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-28 03:22:48,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3330313.3333333335, ans=0.1 2023-11-28 03:22:49,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3330313.3333333335, ans=0.0 2023-11-28 03:22:55,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3330380.0, ans=0.2 2023-11-28 03:22:57,481 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:22:58,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=22.5 2023-11-28 03:23:07,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3330446.6666666665, ans=0.0 2023-11-28 03:23:08,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3330446.6666666665, ans=0.2 2023-11-28 03:23:16,557 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6600, loss[loss=0.07378, simple_loss=0.0998, pruned_loss=0.01541, audio_tagging_loss=0.008478, over 15601.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08905, pruned_loss=0.01228, audio_tagging_loss=0.008743, over 3053176.76 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:23:19,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.958e+01 9.376e+01 9.845e+01 1.305e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:23:39,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3330646.6666666665, ans=10.0 2023-11-28 03:23:40,481 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-28 03:23:41,812 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:23:57,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3330713.3333333335, ans=0.0 2023-11-28 03:24:02,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3330780.0, ans=0.0 2023-11-28 03:24:06,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3330780.0, ans=0.0 2023-11-28 03:24:08,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2023-11-28 03:24:14,472 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6650, loss[loss=0.07107, simple_loss=0.1002, pruned_loss=0.01285, audio_tagging_loss=0.008099, over 15533.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08879, pruned_loss=0.01214, audio_tagging_loss=0.008725, over 3050881.47 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:24:14,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330846.6666666665, ans=0.1 2023-11-28 03:24:18,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3330846.6666666665, ans=0.0 2023-11-28 03:24:27,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3330913.3333333335, ans=0.0 2023-11-28 03:24:38,502 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-28 03:24:38,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2023-11-28 03:24:57,155 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:25:04,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3331113.3333333335, ans=0.1 2023-11-28 03:25:10,988 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6700, loss[loss=0.06708, simple_loss=0.08258, pruned_loss=0.01743, audio_tagging_loss=0.008362, over 14203.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08898, pruned_loss=0.01232, audio_tagging_loss=0.008687, over 3047724.60 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:25:14,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.626e+01 9.557e+01 1.018e+02 1.449e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 03:25:16,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-28 03:25:18,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-28 03:25:25,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3331246.6666666665, ans=0.2 2023-11-28 03:25:36,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-28 03:25:39,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3331313.3333333335, ans=0.125 2023-11-28 03:25:40,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3331313.3333333335, ans=0.125 2023-11-28 03:25:42,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-28 03:26:08,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3331513.3333333335, ans=0.2 2023-11-28 03:26:08,888 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6750, loss[loss=0.04418, simple_loss=0.04931, pruned_loss=0.008118, audio_tagging_loss=0.01141, over 14834.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08885, pruned_loss=0.0124, audio_tagging_loss=0.008686, over 3044201.45 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:26:12,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3331513.3333333335, ans=0.0 2023-11-28 03:26:14,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2023-11-28 03:26:29,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3331580.0, ans=0.2 2023-11-28 03:26:31,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3331646.6666666665, ans=0.05 2023-11-28 03:26:32,839 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-28 03:26:40,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3331646.6666666665, ans=0.125 2023-11-28 03:26:51,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3331713.3333333335, ans=0.0 2023-11-28 03:27:00,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3331780.0, ans=0.125 2023-11-28 03:27:01,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3331780.0, ans=0.125 2023-11-28 03:27:06,685 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6800, loss[loss=0.05886, simple_loss=0.07842, pruned_loss=0.009249, audio_tagging_loss=0.0104, over 14391.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08885, pruned_loss=0.01234, audio_tagging_loss=0.008669, over 3052223.61 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:27:07,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-28 03:27:09,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3331846.6666666665, ans=0.125 2023-11-28 03:27:10,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.683e+01 9.159e+01 9.907e+01 1.833e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-28 03:27:18,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3331913.3333333335, ans=0.0 2023-11-28 03:27:21,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-28 03:27:24,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3331913.3333333335, ans=0.1 2023-11-28 03:27:24,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-28 03:27:28,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-28 03:27:30,267 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-28 03:27:31,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-28 03:27:34,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3331980.0, ans=0.125 2023-11-28 03:27:34,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3331980.0, ans=0.2 2023-11-28 03:27:58,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3332113.3333333335, ans=0.125 2023-11-28 03:28:03,799 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6850, loss[loss=0.06882, simple_loss=0.08998, pruned_loss=0.01377, audio_tagging_loss=0.01006, over 14191.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0892, pruned_loss=0.01235, audio_tagging_loss=0.008565, over 3046040.29 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:28:06,297 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:28:15,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3332246.6666666665, ans=0.125 2023-11-28 03:28:25,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3332313.3333333335, ans=10.0 2023-11-28 03:28:28,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-28 03:28:28,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3332313.3333333335, ans=0.0 2023-11-28 03:28:47,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332380.0, ans=0.1 2023-11-28 03:28:53,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3332446.6666666665, ans=0.0 2023-11-28 03:28:53,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3332446.6666666665, ans=0.0 2023-11-28 03:28:59,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-28 03:29:01,371 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6900, loss[loss=0.05748, simple_loss=0.07791, pruned_loss=0.01097, audio_tagging_loss=0.007561, over 14747.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08909, pruned_loss=0.01219, audio_tagging_loss=0.008575, over 3045280.29 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:29:01,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=22.5 2023-11-28 03:29:07,515 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.595e+01 9.072e+01 9.849e+01 1.232e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-28 03:29:09,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3332513.3333333335, ans=0.1 2023-11-28 03:29:14,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3332580.0, ans=0.2 2023-11-28 03:29:16,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3332580.0, ans=0.125 2023-11-28 03:29:21,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3332580.0, ans=0.125 2023-11-28 03:29:25,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-28 03:29:47,263 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:29:55,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3332780.0, ans=0.125 2023-11-28 03:29:56,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3332780.0, ans=0.125 2023-11-28 03:29:58,883 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:29:59,742 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6950, loss[loss=0.06348, simple_loss=0.07681, pruned_loss=0.01476, audio_tagging_loss=0.01031, over 14244.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08912, pruned_loss=0.0121, audio_tagging_loss=0.008656, over 3042289.59 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:30:13,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3332913.3333333335, ans=0.0 2023-11-28 03:30:19,469 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:30:20,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3332980.0, ans=0.0 2023-11-28 03:30:23,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-28 03:30:27,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3332980.0, ans=0.125 2023-11-28 03:30:48,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3333113.3333333335, ans=0.125 2023-11-28 03:30:56,304 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7000, loss[loss=0.06, simple_loss=0.0784, pruned_loss=0.01178, audio_tagging_loss=0.009022, over 14667.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08892, pruned_loss=0.01205, audio_tagging_loss=0.008724, over 3039299.15 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:31:01,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.564e+01 9.211e+01 9.659e+01 1.272e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 03:31:15,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3333246.6666666665, ans=0.0 2023-11-28 03:31:20,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-28 03:31:26,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3333313.3333333335, ans=0.0 2023-11-28 03:31:32,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3333380.0, ans=0.0 2023-11-28 03:31:34,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3333380.0, ans=0.125 2023-11-28 03:31:45,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3333446.6666666665, ans=0.0 2023-11-28 03:31:46,422 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:31:47,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3333446.6666666665, ans=0.125 2023-11-28 03:31:55,576 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7050, loss[loss=0.06299, simple_loss=0.09025, pruned_loss=0.009806, audio_tagging_loss=0.00806, over 15081.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0893, pruned_loss=0.01216, audio_tagging_loss=0.008742, over 3040097.55 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:15,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3333580.0, ans=0.125 2023-11-28 03:32:18,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2023-11-28 03:32:18,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-28 03:32:19,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-28 03:32:22,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-28 03:32:52,927 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7100, loss[loss=0.06749, simple_loss=0.08674, pruned_loss=0.01546, audio_tagging_loss=0.008655, over 15825.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08873, pruned_loss=0.01207, audio_tagging_loss=0.00885, over 3045562.69 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:54,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3333846.6666666665, ans=0.0 2023-11-28 03:32:58,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.733e+01 9.408e+01 1.010e+02 1.480e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 03:33:05,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-28 03:33:08,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333913.3333333335, ans=0.1 2023-11-28 03:33:15,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3333980.0, ans=0.0 2023-11-28 03:33:16,488 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-28 03:33:17,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3333980.0, ans=0.0 2023-11-28 03:33:27,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3334046.6666666665, ans=0.0 2023-11-28 03:33:49,672 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7150, loss[loss=0.07573, simple_loss=0.1064, pruned_loss=0.01539, audio_tagging_loss=0.007116, over 15140.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08921, pruned_loss=0.01219, audio_tagging_loss=0.008892, over 3043874.70 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:34:11,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3334313.3333333335, ans=0.125 2023-11-28 03:34:13,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-28 03:34:15,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3334313.3333333335, ans=0.0 2023-11-28 03:34:29,774 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:34:32,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3334380.0, ans=0.2 2023-11-28 03:34:39,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-28 03:34:46,521 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7200, loss[loss=0.07486, simple_loss=0.1082, pruned_loss=0.01409, audio_tagging_loss=0.006677, over 14095.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08963, pruned_loss=0.01229, audio_tagging_loss=0.008879, over 3047715.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:34:46,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3334513.3333333335, ans=0.0 2023-11-28 03:34:46,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3334513.3333333335, ans=0.1 2023-11-28 03:34:46,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3334513.3333333335, ans=0.125 2023-11-28 03:34:51,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-28 03:34:51,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.895e+01 9.379e+01 1.001e+02 1.500e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 03:34:56,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3334580.0, ans=0.2 2023-11-28 03:35:06,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3334580.0, ans=0.125 2023-11-28 03:35:10,571 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-28 03:35:10,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3334646.6666666665, ans=0.1 2023-11-28 03:35:15,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3334646.6666666665, ans=0.2 2023-11-28 03:35:24,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-28 03:35:24,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-28 03:35:25,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-28 03:35:30,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3334713.3333333335, ans=0.1 2023-11-28 03:35:37,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3334780.0, ans=0.02 2023-11-28 03:35:43,148 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7250, loss[loss=0.05363, simple_loss=0.07261, pruned_loss=0.008236, audio_tagging_loss=0.009085, over 14653.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08919, pruned_loss=0.01214, audio_tagging_loss=0.008918, over 3046631.21 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:07,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-28 03:36:28,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-28 03:36:29,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3335113.3333333335, ans=0.125 2023-11-28 03:36:40,976 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7300, loss[loss=0.07454, simple_loss=0.105, pruned_loss=0.0152, audio_tagging_loss=0.00686, over 15568.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09003, pruned_loss=0.01233, audio_tagging_loss=0.008823, over 3048549.58 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:46,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.677e+01 9.313e+01 1.019e+02 2.186e+02, threshold=1.863e+02, percent-clipped=1.0 2023-11-28 03:36:54,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-28 03:37:04,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-28 03:37:17,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2023-11-28 03:37:20,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3335380.0, ans=0.0 2023-11-28 03:37:21,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3335380.0, ans=0.125 2023-11-28 03:37:26,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3335446.6666666665, ans=0.0 2023-11-28 03:37:37,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3335513.3333333335, ans=0.125 2023-11-28 03:37:38,162 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7350, loss[loss=0.07392, simple_loss=0.1039, pruned_loss=0.01618, audio_tagging_loss=0.005816, over 15297.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09003, pruned_loss=0.01237, audio_tagging_loss=0.008715, over 3052965.24 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:38:02,884 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-28 03:38:13,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3335713.3333333335, ans=0.0 2023-11-28 03:38:21,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-28 03:38:21,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-28 03:38:25,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3335780.0, ans=0.125 2023-11-28 03:38:27,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3335780.0, ans=0.125 2023-11-28 03:38:27,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3335780.0, ans=0.125 2023-11-28 03:38:32,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3335780.0, ans=0.125 2023-11-28 03:38:35,810 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7400, loss[loss=0.06942, simple_loss=0.1053, pruned_loss=0.01189, audio_tagging_loss=0.004901, over 15376.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09009, pruned_loss=0.01242, audio_tagging_loss=0.008636, over 3048282.40 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:38:39,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3335846.6666666665, ans=0.95 2023-11-28 03:38:43,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.811e+01 9.404e+01 1.022e+02 2.241e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-28 03:38:51,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335913.3333333335, ans=0.1 2023-11-28 03:39:00,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-28 03:39:00,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3335980.0, ans=0.125 2023-11-28 03:39:27,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3336113.3333333335, ans=0.125 2023-11-28 03:39:34,651 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7450, loss[loss=0.06341, simple_loss=0.08275, pruned_loss=0.01597, audio_tagging_loss=0.006069, over 14384.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08925, pruned_loss=0.01242, audio_tagging_loss=0.00857, over 3048649.39 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:39:38,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3336180.0, ans=0.125 2023-11-28 03:39:38,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.50 vs. limit=10.0 2023-11-28 03:39:58,194 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-28 03:40:24,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3336446.6666666665, ans=0.2 2023-11-28 03:40:30,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3336513.3333333335, ans=0.125 2023-11-28 03:40:31,117 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7500, loss[loss=0.06201, simple_loss=0.07824, pruned_loss=0.01172, audio_tagging_loss=0.01116, over 15991.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08937, pruned_loss=0.0125, audio_tagging_loss=0.008535, over 3043831.20 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:40:31,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-28 03:40:38,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.074e+01 9.605e+01 1.016e+02 1.899e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 03:40:43,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-11-28 03:40:55,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-28 03:41:01,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3336646.6666666665, ans=0.0 2023-11-28 03:41:11,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3336713.3333333335, ans=0.125 2023-11-28 03:41:25,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2023-11-28 03:41:28,464 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7550, loss[loss=0.07177, simple_loss=0.1097, pruned_loss=0.009302, audio_tagging_loss=0.007609, over 15504.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08921, pruned_loss=0.0124, audio_tagging_loss=0.008504, over 3046330.60 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:41:31,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3336846.6666666665, ans=0.1 2023-11-28 03:41:52,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-28 03:41:53,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3336980.0, ans=0.07 2023-11-28 03:42:02,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-28 03:42:13,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3337113.3333333335, ans=0.0 2023-11-28 03:42:16,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3337113.3333333335, ans=0.0 2023-11-28 03:42:24,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337113.3333333335, ans=0.1 2023-11-28 03:42:26,171 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7600, loss[loss=0.05817, simple_loss=0.07797, pruned_loss=0.01046, audio_tagging_loss=0.008722, over 16160.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08906, pruned_loss=0.01238, audio_tagging_loss=0.008517, over 3053490.90 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:42:28,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3337180.0, ans=0.0 2023-11-28 03:42:32,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.828e+01 9.447e+01 1.020e+02 1.254e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 03:42:50,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-28 03:42:51,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3337313.3333333335, ans=0.0 2023-11-28 03:43:10,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337380.0, ans=0.1 2023-11-28 03:43:12,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-28 03:43:21,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3337446.6666666665, ans=0.1 2023-11-28 03:43:23,924 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7650, loss[loss=0.08223, simple_loss=0.1161, pruned_loss=0.01489, audio_tagging_loss=0.009274, over 16171.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08854, pruned_loss=0.0123, audio_tagging_loss=0.008557, over 3046843.36 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:43:26,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3337513.3333333335, ans=0.125 2023-11-28 03:43:48,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-28 03:44:09,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3337780.0, ans=0.125 2023-11-28 03:44:10,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-28 03:44:19,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3337780.0, ans=0.125 2023-11-28 03:44:21,213 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7700, loss[loss=0.0552, simple_loss=0.07481, pruned_loss=0.01062, audio_tagging_loss=0.007177, over 15028.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08944, pruned_loss=0.01256, audio_tagging_loss=0.008535, over 3043675.70 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:44:22,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-28 03:44:27,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.661e+01 9.049e+01 9.903e+01 1.330e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-28 03:44:27,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-28 03:44:44,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-28 03:44:53,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3337980.0, ans=0.125 2023-11-28 03:45:04,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3338046.6666666665, ans=0.2 2023-11-28 03:45:15,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3338113.3333333335, ans=0.125 2023-11-28 03:45:15,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3338113.3333333335, ans=0.0 2023-11-28 03:45:18,639 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7750, loss[loss=0.05799, simple_loss=0.078, pruned_loss=0.01058, audio_tagging_loss=0.00841, over 15842.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09028, pruned_loss=0.01266, audio_tagging_loss=0.008558, over 3048484.60 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:45:21,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3338180.0, ans=0.125 2023-11-28 03:45:24,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-28 03:45:42,982 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-28 03:46:15,549 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7800, loss[loss=0.05761, simple_loss=0.07448, pruned_loss=0.01015, audio_tagging_loss=0.01022, over 14946.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08989, pruned_loss=0.01249, audio_tagging_loss=0.008706, over 3040530.83 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:46:22,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.833e+01 9.588e+01 1.059e+02 1.292e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 03:46:23,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-28 03:46:28,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2023-11-28 03:46:29,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3338580.0, ans=0.125 2023-11-28 03:46:29,019 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:46:31,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3338580.0, ans=0.025 2023-11-28 03:46:38,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2023-11-28 03:46:39,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-28 03:46:46,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3338646.6666666665, ans=0.2 2023-11-28 03:46:54,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-28 03:47:08,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3338780.0, ans=0.2 2023-11-28 03:47:13,889 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7850, loss[loss=0.06411, simple_loss=0.08303, pruned_loss=0.01079, audio_tagging_loss=0.01181, over 14273.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09068, pruned_loss=0.01251, audio_tagging_loss=0.008639, over 3046591.69 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:47:25,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3338913.3333333335, ans=0.2 2023-11-28 03:47:34,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3338913.3333333335, ans=0.09899494936611666 2023-11-28 03:47:37,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-28 03:48:10,412 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7900, loss[loss=0.07561, simple_loss=0.1021, pruned_loss=0.01297, audio_tagging_loss=0.01161, over 15107.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09074, pruned_loss=0.01248, audio_tagging_loss=0.008728, over 3043803.36 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:48:17,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.758e+01 9.324e+01 1.005e+02 1.322e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 03:48:18,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3339180.0, ans=0.2 2023-11-28 03:48:20,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-11-28 03:48:22,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3339246.6666666665, ans=0.2 2023-11-28 03:48:30,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3339246.6666666665, ans=0.125 2023-11-28 03:48:33,960 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-28 03:48:34,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-28 03:49:00,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3339446.6666666665, ans=0.125 2023-11-28 03:49:06,822 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7950, loss[loss=0.07844, simple_loss=0.1126, pruned_loss=0.01193, audio_tagging_loss=0.01023, over 14951.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09017, pruned_loss=0.0124, audio_tagging_loss=0.008802, over 3039783.97 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:49:24,546 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:49:31,080 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-28 03:49:37,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3339646.6666666665, ans=0.1 2023-11-28 03:49:58,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3339780.0, ans=0.1 2023-11-28 03:50:00,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3339780.0, ans=0.0 2023-11-28 03:50:00,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=22.5 2023-11-28 03:50:04,223 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8000, loss[loss=0.04396, simple_loss=0.05624, pruned_loss=0.005166, audio_tagging_loss=0.01067, over 15302.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08886, pruned_loss=0.01215, audio_tagging_loss=0.008982, over 3033420.20 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:50:11,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.539e+01 9.143e+01 9.818e+01 1.375e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-28 03:50:23,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3339913.3333333335, ans=0.0 2023-11-28 03:50:26,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3339980.0, ans=0.0 2023-11-28 03:50:28,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-28 03:50:32,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3339980.0, ans=0.125 2023-11-28 03:50:34,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3339980.0, ans=0.2 2023-11-28 03:50:46,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3340046.6666666665, ans=0.125 2023-11-28 03:50:53,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3340113.3333333335, ans=0.125 2023-11-28 03:50:54,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3340113.3333333335, ans=0.09899494936611666 2023-11-28 03:51:02,056 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8050, loss[loss=0.06491, simple_loss=0.0946, pruned_loss=0.008849, audio_tagging_loss=0.008767, over 15393.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08899, pruned_loss=0.01224, audio_tagging_loss=0.00905, over 3033390.79 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:51:26,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-28 03:51:26,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3340313.3333333335, ans=0.125 2023-11-28 03:51:48,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3340446.6666666665, ans=0.125 2023-11-28 03:51:54,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2023-11-28 03:52:00,071 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8100, loss[loss=0.05349, simple_loss=0.07434, pruned_loss=0.008227, audio_tagging_loss=0.00809, over 15463.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08955, pruned_loss=0.01219, audio_tagging_loss=0.008942, over 3044146.60 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:52:03,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-11-28 03:52:07,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.647e+01 9.377e+01 1.005e+02 1.143e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:52:15,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-28 03:52:16,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3340580.0, ans=0.125 2023-11-28 03:52:24,095 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-28 03:52:27,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3340646.6666666665, ans=0.125 2023-11-28 03:52:32,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3340646.6666666665, ans=0.1 2023-11-28 03:52:47,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-28 03:52:51,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.59 vs. limit=10.0 2023-11-28 03:52:56,854 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8150, loss[loss=0.07175, simple_loss=0.1002, pruned_loss=0.01125, audio_tagging_loss=0.0104, over 14127.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09023, pruned_loss=0.01236, audio_tagging_loss=0.008853, over 3041461.75 frames. ], batch size: 52, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:21,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-28 03:53:26,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3340980.0, ans=0.1 2023-11-28 03:53:33,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3341046.6666666665, ans=0.2 2023-11-28 03:53:34,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.36 vs. limit=10.0 2023-11-28 03:53:40,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341046.6666666665, ans=0.1 2023-11-28 03:53:53,993 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8200, loss[loss=0.06997, simple_loss=0.09694, pruned_loss=0.01562, audio_tagging_loss=0.005886, over 16112.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09002, pruned_loss=0.01249, audio_tagging_loss=0.008774, over 3042287.05 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:57,326 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:54:02,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.802e+01 9.578e+01 1.025e+02 1.373e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 03:54:09,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341246.6666666665, ans=0.1 2023-11-28 03:54:13,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3341246.6666666665, ans=0.125 2023-11-28 03:54:17,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-28 03:54:17,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3341313.3333333335, ans=0.125 2023-11-28 03:54:47,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-28 03:54:51,790 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8250, loss[loss=0.06398, simple_loss=0.08463, pruned_loss=0.009781, audio_tagging_loss=0.01189, over 14843.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09004, pruned_loss=0.01243, audio_tagging_loss=0.008692, over 3041213.15 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:54:55,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-28 03:54:56,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-28 03:55:00,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3341513.3333333335, ans=0.125 2023-11-28 03:55:01,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3341580.0, ans=0.0 2023-11-28 03:55:15,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-28 03:55:28,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2023-11-28 03:55:42,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3341780.0, ans=0.125 2023-11-28 03:55:44,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-11-28 03:55:47,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-28 03:55:48,579 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8300, loss[loss=0.0902, simple_loss=0.1124, pruned_loss=0.02582, audio_tagging_loss=0.0082, over 15427.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09018, pruned_loss=0.0124, audio_tagging_loss=0.008735, over 3040133.78 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:55:48,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3341846.6666666665, ans=0.125 2023-11-28 03:55:56,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.790e+01 9.364e+01 1.000e+02 1.308e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:56:13,761 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-28 03:56:14,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3341980.0, ans=0.125 2023-11-28 03:56:18,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3341980.0, ans=0.125 2023-11-28 03:56:45,953 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8350, loss[loss=0.04836, simple_loss=0.05664, pruned_loss=0.008975, audio_tagging_loss=0.01106, over 15147.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08958, pruned_loss=0.01228, audio_tagging_loss=0.008761, over 3038243.63 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:56:47,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=10.0 2023-11-28 03:56:49,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3342180.0, ans=0.125 2023-11-28 03:56:52,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342180.0, ans=0.1 2023-11-28 03:57:10,701 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-28 03:57:23,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3342380.0, ans=0.0 2023-11-28 03:57:26,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342380.0, ans=0.1 2023-11-28 03:57:31,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3342446.6666666665, ans=0.0 2023-11-28 03:57:32,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3342446.6666666665, ans=0.2 2023-11-28 03:57:42,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3342446.6666666665, ans=0.125 2023-11-28 03:57:43,977 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8400, loss[loss=0.04709, simple_loss=0.06583, pruned_loss=0.006027, audio_tagging_loss=0.008149, over 15091.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09023, pruned_loss=0.01231, audio_tagging_loss=0.008699, over 3040492.91 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:57:45,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342513.3333333335, ans=0.1 2023-11-28 03:57:51,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.873e+01 9.503e+01 1.023e+02 1.226e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 03:57:57,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-28 03:58:07,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-28 03:58:41,308 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8450, loss[loss=0.06984, simple_loss=0.1019, pruned_loss=0.01083, audio_tagging_loss=0.008041, over 15102.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09128, pruned_loss=0.01241, audio_tagging_loss=0.008658, over 3045973.39 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:58:56,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3342913.3333333335, ans=0.125 2023-11-28 03:58:59,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3342913.3333333335, ans=0.0 2023-11-28 03:59:05,841 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-28 03:59:19,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-28 03:59:23,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3343046.6666666665, ans=0.125 2023-11-28 03:59:38,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3343180.0, ans=0.1 2023-11-28 03:59:39,103 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8500, loss[loss=0.08702, simple_loss=0.125, pruned_loss=0.0195, audio_tagging_loss=0.004998, over 14820.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0916, pruned_loss=0.01242, audio_tagging_loss=0.008615, over 3046497.72 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:59:39,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3343180.0, ans=0.125 2023-11-28 03:59:46,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.888e+01 9.285e+01 1.024e+02 1.288e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 03:59:50,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3343246.6666666665, ans=0.0 2023-11-28 04:00:03,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-28 04:00:29,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2023-11-28 04:00:36,603 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8550, loss[loss=0.04622, simple_loss=0.05674, pruned_loss=0.00737, audio_tagging_loss=0.01048, over 16072.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09124, pruned_loss=0.01245, audio_tagging_loss=0.008592, over 3046681.31 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:01:00,896 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-28 04:01:04,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3343646.6666666665, ans=0.2 2023-11-28 04:01:05,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3343646.6666666665, ans=0.125 2023-11-28 04:01:06,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3343646.6666666665, ans=0.1 2023-11-28 04:01:20,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3343713.3333333335, ans=0.0 2023-11-28 04:01:21,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3343780.0, ans=0.035 2023-11-28 04:01:25,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-28 04:01:31,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3343780.0, ans=0.05 2023-11-28 04:01:33,867 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8600, loss[loss=0.05988, simple_loss=0.07875, pruned_loss=0.009219, audio_tagging_loss=0.01128, over 16200.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09037, pruned_loss=0.01234, audio_tagging_loss=0.008681, over 3045946.72 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:01:42,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.741e+01 9.411e+01 9.975e+01 1.880e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 04:01:44,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3343913.3333333335, ans=0.0 2023-11-28 04:01:53,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3343913.3333333335, ans=0.125 2023-11-28 04:01:57,977 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-28 04:02:03,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3343980.0, ans=0.125 2023-11-28 04:02:11,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3344046.6666666665, ans=0.2 2023-11-28 04:02:11,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=22.5 2023-11-28 04:02:17,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3344046.6666666665, ans=0.125 2023-11-28 04:02:23,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3344113.3333333335, ans=0.02 2023-11-28 04:02:31,102 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8650, loss[loss=0.07657, simple_loss=0.1087, pruned_loss=0.01388, audio_tagging_loss=0.00834, over 14307.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09053, pruned_loss=0.01232, audio_tagging_loss=0.008707, over 3052872.96 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:02:45,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-28 04:02:55,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-28 04:03:11,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3344380.0, ans=0.0 2023-11-28 04:03:14,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3344380.0, ans=0.125 2023-11-28 04:03:23,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3344446.6666666665, ans=0.125 2023-11-28 04:03:28,905 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8700, loss[loss=0.07092, simple_loss=0.09277, pruned_loss=0.01435, audio_tagging_loss=0.01018, over 15369.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09099, pruned_loss=0.01244, audio_tagging_loss=0.008789, over 3056406.82 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:03:33,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3344513.3333333335, ans=0.125 2023-11-28 04:03:37,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.850e+01 9.398e+01 9.849e+01 1.274e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 04:03:53,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-28 04:03:53,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3344646.6666666665, ans=0.125 2023-11-28 04:04:01,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3344713.3333333335, ans=0.0 2023-11-28 04:04:14,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3344780.0, ans=0.125 2023-11-28 04:04:26,114 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8750, loss[loss=0.04792, simple_loss=0.06122, pruned_loss=0.006182, audio_tagging_loss=0.01113, over 14220.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09097, pruned_loss=0.01244, audio_tagging_loss=0.008847, over 3046183.14 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:04:29,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3344846.6666666665, ans=0.0 2023-11-28 04:04:35,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3344846.6666666665, ans=0.2 2023-11-28 04:04:49,565 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-28 04:05:22,943 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8800, loss[loss=0.06412, simple_loss=0.09343, pruned_loss=0.009615, audio_tagging_loss=0.007786, over 15744.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09136, pruned_loss=0.01259, audio_tagging_loss=0.008896, over 3046955.19 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:05:24,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3345180.0, ans=0.125 2023-11-28 04:05:31,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.835e+01 9.360e+01 1.012e+02 1.261e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 04:05:32,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2023-11-28 04:05:46,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-28 04:05:59,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3345380.0, ans=0.5 2023-11-28 04:06:19,641 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8850, loss[loss=0.092, simple_loss=0.1293, pruned_loss=0.021, audio_tagging_loss=0.006373, over 15293.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09152, pruned_loss=0.01274, audio_tagging_loss=0.008876, over 3047042.58 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:06:34,736 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:06:37,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-11-28 04:06:44,164 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-28 04:06:57,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3345713.3333333335, ans=0.0 2023-11-28 04:07:06,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3345780.0, ans=0.125 2023-11-28 04:07:09,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3345780.0, ans=0.07 2023-11-28 04:07:10,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=15.0 2023-11-28 04:07:16,660 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8900, loss[loss=0.06179, simple_loss=0.09009, pruned_loss=0.009846, audio_tagging_loss=0.006895, over 15217.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09111, pruned_loss=0.0125, audio_tagging_loss=0.008778, over 3048226.61 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:07:25,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.854e+01 9.513e+01 9.955e+01 1.488e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 04:07:27,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3345913.3333333335, ans=0.125 2023-11-28 04:07:36,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3345913.3333333335, ans=0.125 2023-11-28 04:07:40,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-28 04:08:06,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3346113.3333333335, ans=0.0 2023-11-28 04:08:09,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3346113.3333333335, ans=0.0 2023-11-28 04:08:14,218 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8950, loss[loss=0.05868, simple_loss=0.08608, pruned_loss=0.008877, audio_tagging_loss=0.006766, over 16052.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09083, pruned_loss=0.01232, audio_tagging_loss=0.008635, over 3049009.58 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:08:20,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3346180.0, ans=0.125 2023-11-28 04:08:34,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-28 04:08:36,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3346313.3333333335, ans=0.125 2023-11-28 04:08:37,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-28 04:08:46,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3346380.0, ans=0.2 2023-11-28 04:08:47,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3346380.0, ans=0.125 2023-11-28 04:08:57,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-28 04:08:58,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-11-28 04:09:10,176 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9000, loss[loss=0.06044, simple_loss=0.0815, pruned_loss=0.01199, audio_tagging_loss=0.007707, over 15241.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09107, pruned_loss=0.01247, audio_tagging_loss=0.008553, over 3041432.54 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:09:10,176 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 04:09:30,441 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8076, 4.9485, 5.0791, 4.8621], device='cuda:3') 2023-11-28 04:09:43,087 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0535, 2.6805, 2.9572, 2.7810], device='cuda:3') 2023-11-28 04:09:44,947 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05915, simple_loss=0.05063, pruned_loss=0.005264, audio_tagging_loss=0.02857, over 4681554.00 frames. 2023-11-28 04:09:44,948 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 04:09:51,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-11-28 04:09:54,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.664e+01 9.503e+01 1.037e+02 1.475e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 04:09:54,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3346513.3333333335, ans=0.0 2023-11-28 04:10:00,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3346580.0, ans=0.1 2023-11-28 04:10:09,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-28 04:10:26,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3346713.3333333335, ans=0.125 2023-11-28 04:10:34,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3346780.0, ans=0.1 2023-11-28 04:10:34,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3346780.0, ans=0.125 2023-11-28 04:10:43,075 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9050, loss[loss=0.06437, simple_loss=0.08269, pruned_loss=0.01088, audio_tagging_loss=0.01215, over 15322.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09167, pruned_loss=0.01252, audio_tagging_loss=0.00859, over 3039360.38 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:11:06,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-28 04:11:07,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3346980.0, ans=0.125 2023-11-28 04:11:22,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3347046.6666666665, ans=0.0 2023-11-28 04:11:26,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3347046.6666666665, ans=0.125 2023-11-28 04:11:30,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3347113.3333333335, ans=0.2 2023-11-28 04:11:33,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3347113.3333333335, ans=0.1 2023-11-28 04:11:40,119 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9100, loss[loss=0.05342, simple_loss=0.06928, pruned_loss=0.008876, audio_tagging_loss=0.009904, over 14456.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09021, pruned_loss=0.01233, audio_tagging_loss=0.008549, over 3038240.13 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:11:44,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3347180.0, ans=0.2 2023-11-28 04:11:48,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.691e+01 9.383e+01 1.014e+02 1.282e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 04:12:03,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-28 04:12:12,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3347313.3333333335, ans=0.125 2023-11-28 04:12:36,751 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9150, loss[loss=0.06628, simple_loss=0.0971, pruned_loss=0.01158, audio_tagging_loss=0.006153, over 15499.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09003, pruned_loss=0.01237, audio_tagging_loss=0.008611, over 3034890.15 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:12:57,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3347580.0, ans=0.0 2023-11-28 04:13:01,261 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-28 04:13:01,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3347646.6666666665, ans=0.0 2023-11-28 04:13:15,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347713.3333333335, ans=0.1 2023-11-28 04:13:16,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3347713.3333333335, ans=0.5 2023-11-28 04:13:28,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3347780.0, ans=0.125 2023-11-28 04:13:34,146 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9200, loss[loss=0.05858, simple_loss=0.07815, pruned_loss=0.01026, audio_tagging_loss=0.009237, over 14748.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09017, pruned_loss=0.01233, audio_tagging_loss=0.008578, over 3044090.37 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:13:37,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-11-28 04:13:40,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3347846.6666666665, ans=0.125 2023-11-28 04:13:44,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.837e+01 9.520e+01 1.030e+02 1.268e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 04:13:58,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-28 04:14:01,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3347980.0, ans=0.2 2023-11-28 04:14:20,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3348113.3333333335, ans=0.5 2023-11-28 04:14:20,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3348113.3333333335, ans=0.0 2023-11-28 04:14:20,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=15.0 2023-11-28 04:14:26,880 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:14:32,147 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9250, loss[loss=0.0731, simple_loss=0.1021, pruned_loss=0.0138, audio_tagging_loss=0.008249, over 14959.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.0911, pruned_loss=0.01256, audio_tagging_loss=0.008547, over 3048541.69 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:14:35,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3348180.0, ans=0.125 2023-11-28 04:14:41,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=12.0 2023-11-28 04:14:55,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-28 04:14:58,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3348313.3333333335, ans=0.125 2023-11-28 04:15:12,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-11-28 04:15:13,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-28 04:15:16,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3348446.6666666665, ans=0.125 2023-11-28 04:15:21,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3348446.6666666665, ans=0.2 2023-11-28 04:15:23,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=10.0 2023-11-28 04:15:29,229 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9300, loss[loss=0.07774, simple_loss=0.1153, pruned_loss=0.01264, audio_tagging_loss=0.007434, over 15498.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09059, pruned_loss=0.01245, audio_tagging_loss=0.008562, over 3049802.42 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:15:38,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3348513.3333333335, ans=0.125 2023-11-28 04:15:39,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2023-11-28 04:15:40,868 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.934e+01 9.500e+01 1.008e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 04:15:41,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-28 04:15:43,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3348580.0, ans=0.125 2023-11-28 04:15:49,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3348580.0, ans=0.125 2023-11-28 04:15:52,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-28 04:15:53,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-28 04:16:13,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.86 vs. limit=10.0 2023-11-28 04:16:15,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3348780.0, ans=0.125 2023-11-28 04:16:26,553 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9350, loss[loss=0.03767, simple_loss=0.03873, pruned_loss=0.005636, audio_tagging_loss=0.01267, over 15063.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09039, pruned_loss=0.01237, audio_tagging_loss=0.008652, over 3049430.51 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:16:50,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-28 04:16:55,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3348980.0, ans=0.125 2023-11-28 04:17:23,704 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9400, loss[loss=0.05631, simple_loss=0.06964, pruned_loss=0.00905, audio_tagging_loss=0.01244, over 15390.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09043, pruned_loss=0.01248, audio_tagging_loss=0.008776, over 3045824.38 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:17:31,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3349180.0, ans=0.125 2023-11-28 04:17:35,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.937e+01 9.623e+01 1.033e+02 2.333e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-28 04:17:35,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-28 04:17:46,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3349313.3333333335, ans=0.2 2023-11-28 04:17:47,534 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-28 04:17:58,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3349380.0, ans=0.0 2023-11-28 04:17:58,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3349380.0, ans=0.07 2023-11-28 04:17:59,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3349380.0, ans=0.125 2023-11-28 04:18:00,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2023-11-28 04:18:06,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3349380.0, ans=0.0 2023-11-28 04:18:14,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3349446.6666666665, ans=0.2 2023-11-28 04:18:21,493 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9450, loss[loss=0.05511, simple_loss=0.06924, pruned_loss=0.009681, audio_tagging_loss=0.01081, over 15135.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09021, pruned_loss=0.01249, audio_tagging_loss=0.008837, over 3046427.71 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:18:23,682 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:18:32,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3349580.0, ans=0.0 2023-11-28 04:18:40,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3349580.0, ans=0.2 2023-11-28 04:18:45,171 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-28 04:18:48,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3349646.6666666665, ans=0.0 2023-11-28 04:18:53,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-28 04:19:11,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3349780.0, ans=0.05 2023-11-28 04:19:18,884 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9500, loss[loss=0.08783, simple_loss=0.1128, pruned_loss=0.02141, audio_tagging_loss=0.01005, over 14812.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09026, pruned_loss=0.0125, audio_tagging_loss=0.008836, over 3053202.84 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:19:21,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349846.6666666665, ans=0.1 2023-11-28 04:19:29,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.748e+01 9.346e+01 1.036e+02 1.231e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 04:19:33,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3349913.3333333335, ans=0.04949747468305833 2023-11-28 04:19:43,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-28 04:19:56,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3350046.6666666665, ans=0.2 2023-11-28 04:19:58,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-28 04:20:02,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-28 04:20:08,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-28 04:20:15,479 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9550, loss[loss=0.07883, simple_loss=0.1075, pruned_loss=0.0154, audio_tagging_loss=0.009654, over 14286.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09008, pruned_loss=0.01227, audio_tagging_loss=0.00882, over 3052765.68 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:20:38,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3350313.3333333335, ans=0.125 2023-11-28 04:20:39,845 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-28 04:20:48,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3350313.3333333335, ans=0.0 2023-11-28 04:20:58,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3350380.0, ans=0.125 2023-11-28 04:20:58,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3350380.0, ans=0.125 2023-11-28 04:21:13,657 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9600, loss[loss=0.06084, simple_loss=0.08219, pruned_loss=0.01114, audio_tagging_loss=0.008601, over 14795.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09016, pruned_loss=0.01229, audio_tagging_loss=0.008879, over 3049434.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:21:19,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3350513.3333333335, ans=0.0 2023-11-28 04:21:24,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 8.754e+01 9.206e+01 1.000e+02 1.278e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 04:21:37,306 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-28 04:21:43,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2023-11-28 04:21:49,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3350713.3333333335, ans=15.0 2023-11-28 04:22:03,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3350780.0, ans=0.125 2023-11-28 04:22:06,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3350780.0, ans=0.2 2023-11-28 04:22:10,871 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9650, loss[loss=0.0679, simple_loss=0.0906, pruned_loss=0.01499, audio_tagging_loss=0.007611, over 15913.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09041, pruned_loss=0.01236, audio_tagging_loss=0.008881, over 3051679.09 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:22:11,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3350846.6666666665, ans=0.0 2023-11-28 04:22:17,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3350846.6666666665, ans=0.125 2023-11-28 04:22:34,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3350980.0, ans=0.0 2023-11-28 04:22:35,670 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-28 04:22:49,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3351046.6666666665, ans=0.0 2023-11-28 04:22:56,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3351113.3333333335, ans=0.1 2023-11-28 04:23:07,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3351180.0, ans=0.07 2023-11-28 04:23:08,616 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9700, loss[loss=0.07877, simple_loss=0.1158, pruned_loss=0.01391, audio_tagging_loss=0.00696, over 16535.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09055, pruned_loss=0.01241, audio_tagging_loss=0.008692, over 3052382.18 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:23:21,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.660e+01 9.403e+01 1.036e+02 1.751e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 04:23:33,206 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-28 04:23:50,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3351380.0, ans=0.1 2023-11-28 04:24:06,611 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9750, loss[loss=0.06846, simple_loss=0.08973, pruned_loss=0.01567, audio_tagging_loss=0.00792, over 15492.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09064, pruned_loss=0.01255, audio_tagging_loss=0.008626, over 3052194.27 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:24:18,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=22.5 2023-11-28 04:24:21,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3351580.0, ans=0.0 2023-11-28 04:24:27,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-28 04:24:30,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-28 04:24:45,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3351713.3333333335, ans=0.1 2023-11-28 04:24:47,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3351713.3333333335, ans=0.2 2023-11-28 04:24:52,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3351780.0, ans=0.2 2023-11-28 04:25:04,292 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9800, loss[loss=0.04945, simple_loss=0.05828, pruned_loss=0.0115, audio_tagging_loss=0.008802, over 15624.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09111, pruned_loss=0.01252, audio_tagging_loss=0.008441, over 3053868.22 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:25:14,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3351913.3333333335, ans=0.0 2023-11-28 04:25:16,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.861e+01 9.508e+01 1.028e+02 1.749e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 04:25:28,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-28 04:25:33,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3351980.0, ans=0.025 2023-11-28 04:25:38,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3352046.6666666665, ans=15.0 2023-11-28 04:25:42,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3352046.6666666665, ans=0.07 2023-11-28 04:25:44,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3352046.6666666665, ans=0.0 2023-11-28 04:25:49,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3352113.3333333335, ans=0.125 2023-11-28 04:25:59,719 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:26:01,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352180.0, ans=0.1 2023-11-28 04:26:01,883 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9850, loss[loss=0.059, simple_loss=0.07972, pruned_loss=0.01222, audio_tagging_loss=0.006919, over 13564.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09051, pruned_loss=0.01235, audio_tagging_loss=0.008504, over 3058324.06 frames. ], batch size: 52, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:26:22,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-28 04:26:25,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3352313.3333333335, ans=0.125 2023-11-28 04:26:26,267 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-28 04:26:32,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-28 04:26:41,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3352380.0, ans=0.125 2023-11-28 04:26:43,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352380.0, ans=0.1 2023-11-28 04:26:56,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3352446.6666666665, ans=0.0 2023-11-28 04:26:59,725 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9900, loss[loss=0.07413, simple_loss=0.09677, pruned_loss=0.01909, audio_tagging_loss=0.006652, over 15146.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09015, pruned_loss=0.01242, audio_tagging_loss=0.008506, over 3056356.31 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:27:02,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3352513.3333333335, ans=0.125 2023-11-28 04:27:12,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.663e+01 9.354e+01 9.948e+01 1.345e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 04:27:15,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3352580.0, ans=0.0 2023-11-28 04:27:23,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-28 04:27:42,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2023-11-28 04:27:57,210 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9950, loss[loss=0.05393, simple_loss=0.07186, pruned_loss=0.009884, audio_tagging_loss=0.008113, over 13875.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08992, pruned_loss=0.01238, audio_tagging_loss=0.008581, over 3058373.96 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:27:58,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-28 04:28:08,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3352913.3333333335, ans=0.125 2023-11-28 04:28:20,932 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-28 04:28:22,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-28 04:28:52,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-28 04:28:54,842 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10000, loss[loss=0.0657, simple_loss=0.09296, pruned_loss=0.01046, audio_tagging_loss=0.008761, over 15297.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08962, pruned_loss=0.01224, audio_tagging_loss=0.00858, over 3054960.73 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:00,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3353180.0, ans=0.0 2023-11-28 04:29:08,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.771e+01 9.442e+01 1.017e+02 1.444e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 04:29:18,714 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-28 04:29:27,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3353313.3333333335, ans=0.125 2023-11-28 04:29:52,378 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10050, loss[loss=0.0839, simple_loss=0.1257, pruned_loss=0.01583, audio_tagging_loss=0.005224, over 15004.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08978, pruned_loss=0.01231, audio_tagging_loss=0.00858, over 3052805.86 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:53,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-11-28 04:30:14,854 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:30:17,444 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-28 04:30:28,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3353713.3333333335, ans=0.125 2023-11-28 04:30:41,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3353780.0, ans=0.1 2023-11-28 04:30:42,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3353780.0, ans=0.0 2023-11-28 04:30:50,287 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10100, loss[loss=0.05502, simple_loss=0.06659, pruned_loss=0.009066, audio_tagging_loss=0.01266, over 15065.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08985, pruned_loss=0.01224, audio_tagging_loss=0.008679, over 3057586.55 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:30:53,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3353846.6666666665, ans=0.125 2023-11-28 04:30:56,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-28 04:31:04,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.581e+01 9.372e+01 1.014e+02 1.280e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 04:31:14,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-28 04:31:26,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3354046.6666666665, ans=0.0 2023-11-28 04:31:39,631 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:31:40,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3354113.3333333335, ans=0.0 2023-11-28 04:31:48,554 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10150, loss[loss=0.06126, simple_loss=0.08125, pruned_loss=0.01281, audio_tagging_loss=0.007824, over 15505.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08955, pruned_loss=0.01215, audio_tagging_loss=0.008671, over 3059507.73 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:32:06,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3354246.6666666665, ans=0.0 2023-11-28 04:32:11,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3354313.3333333335, ans=0.09899494936611666 2023-11-28 04:32:12,523 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-28 04:32:18,933 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:32:20,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3354313.3333333335, ans=0.125 2023-11-28 04:32:45,388 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10200, loss[loss=0.05213, simple_loss=0.06045, pruned_loss=0.009561, audio_tagging_loss=0.01235, over 15189.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08895, pruned_loss=0.01204, audio_tagging_loss=0.008838, over 3054707.54 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:32:59,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.630e+01 9.209e+01 1.011e+02 1.470e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 04:33:00,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3354580.0, ans=0.2 2023-11-28 04:33:07,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3354646.6666666665, ans=0.04949747468305833 2023-11-28 04:33:09,094 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-28 04:33:10,767 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:33:26,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3354713.3333333335, ans=0.2 2023-11-28 04:33:31,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-28 04:33:34,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3354780.0, ans=0.0 2023-11-28 04:33:37,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3354780.0, ans=0.125 2023-11-28 04:33:39,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3354780.0, ans=0.035 2023-11-28 04:33:41,816 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10250, loss[loss=0.0694, simple_loss=0.09452, pruned_loss=0.01367, audio_tagging_loss=0.008462, over 14403.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08956, pruned_loss=0.01223, audio_tagging_loss=0.008894, over 3057331.25 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:33:51,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3354913.3333333335, ans=0.2 2023-11-28 04:34:05,866 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-28 04:34:14,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3354980.0, ans=0.0 2023-11-28 04:34:25,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3355113.3333333335, ans=0.0 2023-11-28 04:34:33,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3355113.3333333335, ans=0.125 2023-11-28 04:34:38,536 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10300, loss[loss=0.0633, simple_loss=0.08239, pruned_loss=0.01233, audio_tagging_loss=0.00977, over 15375.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08969, pruned_loss=0.01212, audio_tagging_loss=0.00891, over 3057502.95 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:34:43,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3355180.0, ans=0.0 2023-11-28 04:34:51,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.001e+01 9.538e+01 1.014e+02 1.211e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 04:34:52,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-28 04:35:02,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-28 04:35:28,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3355446.6666666665, ans=0.125 2023-11-28 04:35:31,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3355446.6666666665, ans=0.09899494936611666 2023-11-28 04:35:35,742 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10350, loss[loss=0.06486, simple_loss=0.08795, pruned_loss=0.01161, audio_tagging_loss=0.009274, over 14414.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09037, pruned_loss=0.01212, audio_tagging_loss=0.008941, over 3058758.19 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:35:43,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2023-11-28 04:35:49,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=22.5 2023-11-28 04:35:59,217 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-28 04:36:12,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3355713.3333333335, ans=0.125 2023-11-28 04:36:13,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3355713.3333333335, ans=0.04949747468305833 2023-11-28 04:36:32,724 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10400, loss[loss=0.07931, simple_loss=0.1164, pruned_loss=0.01311, audio_tagging_loss=0.008004, over 13660.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09011, pruned_loss=0.01215, audio_tagging_loss=0.009009, over 3055206.51 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:36:40,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-11-28 04:36:46,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3355913.3333333335, ans=0.125 2023-11-28 04:36:47,554 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.770e+01 9.452e+01 1.025e+02 1.480e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 04:36:52,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-28 04:36:56,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3355980.0, ans=0.125 2023-11-28 04:36:56,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-28 04:37:16,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3356046.6666666665, ans=0.07 2023-11-28 04:37:30,461 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10450, loss[loss=0.06641, simple_loss=0.09156, pruned_loss=0.00929, audio_tagging_loss=0.01134, over 15141.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0901, pruned_loss=0.01211, audio_tagging_loss=0.008939, over 3053639.92 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:37:44,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3356246.6666666665, ans=0.1 2023-11-28 04:37:55,349 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-28 04:38:03,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356313.3333333335, ans=0.125 2023-11-28 04:38:11,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3356380.0, ans=0.0 2023-11-28 04:38:28,237 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10500, loss[loss=0.05456, simple_loss=0.07356, pruned_loss=0.01023, audio_tagging_loss=0.007556, over 14624.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08931, pruned_loss=0.01215, audio_tagging_loss=0.008834, over 3050155.13 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:38:36,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3356513.3333333335, ans=0.0 2023-11-28 04:38:43,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.671e+01 9.492e+01 1.004e+02 1.311e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 04:38:52,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-28 04:39:25,948 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10550, loss[loss=0.08115, simple_loss=0.1059, pruned_loss=0.01771, audio_tagging_loss=0.01051, over 15991.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08955, pruned_loss=0.01235, audio_tagging_loss=0.008733, over 3050085.24 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:39:39,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=22.5 2023-11-28 04:39:40,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2023-11-28 04:39:44,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3356913.3333333335, ans=0.0 2023-11-28 04:39:46,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3356913.3333333335, ans=0.125 2023-11-28 04:39:47,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-28 04:39:49,577 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-28 04:39:51,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356980.0, ans=0.125 2023-11-28 04:40:04,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3357046.6666666665, ans=0.0 2023-11-28 04:40:14,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3357113.3333333335, ans=0.0 2023-11-28 04:40:22,838 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10600, loss[loss=0.07824, simple_loss=0.1071, pruned_loss=0.01533, audio_tagging_loss=0.009359, over 15417.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09027, pruned_loss=0.01258, audio_tagging_loss=0.008639, over 3048754.81 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:40:30,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3357180.0, ans=0.125 2023-11-28 04:40:31,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357180.0, ans=0.1 2023-11-28 04:40:35,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3357246.6666666665, ans=0.09899494936611666 2023-11-28 04:40:37,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.827e+01 9.555e+01 1.028e+02 1.264e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 04:40:48,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-28 04:40:53,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3357313.3333333335, ans=0.2 2023-11-28 04:40:54,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3357313.3333333335, ans=0.0 2023-11-28 04:41:06,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-28 04:41:07,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-28 04:41:11,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3357446.6666666665, ans=0.0 2023-11-28 04:41:21,487 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10650, loss[loss=0.06492, simple_loss=0.09218, pruned_loss=0.01005, audio_tagging_loss=0.008776, over 14620.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08979, pruned_loss=0.01239, audio_tagging_loss=0.008521, over 3054865.11 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:41:21,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3357513.3333333335, ans=0.2 2023-11-28 04:41:23,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-11-28 04:41:30,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357513.3333333335, ans=0.1 2023-11-28 04:41:36,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-28 04:41:38,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3357580.0, ans=0.125 2023-11-28 04:41:46,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-28 04:41:50,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3357646.6666666665, ans=0.0 2023-11-28 04:42:03,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3357713.3333333335, ans=0.0 2023-11-28 04:42:07,016 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:42:20,147 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10700, loss[loss=0.07855, simple_loss=0.1073, pruned_loss=0.0171, audio_tagging_loss=0.007786, over 15515.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0905, pruned_loss=0.0126, audio_tagging_loss=0.008513, over 3044257.05 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:42:27,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3357846.6666666665, ans=0.125 2023-11-28 04:42:34,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3357913.3333333335, ans=0.125 2023-11-28 04:42:35,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.497e+01 9.278e+01 9.975e+01 1.438e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-28 04:42:43,729 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-28 04:42:46,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3357980.0, ans=0.125 2023-11-28 04:42:49,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3357980.0, ans=0.125 2023-11-28 04:42:51,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3357980.0, ans=10.0 2023-11-28 04:43:16,261 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10750, loss[loss=0.06871, simple_loss=0.1008, pruned_loss=0.01072, audio_tagging_loss=0.007598, over 14895.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09071, pruned_loss=0.01244, audio_tagging_loss=0.008525, over 3053580.40 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:43:31,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3358246.6666666665, ans=0.0 2023-11-28 04:43:31,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3358246.6666666665, ans=0.125 2023-11-28 04:43:40,948 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-28 04:43:41,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3358313.3333333335, ans=0.125 2023-11-28 04:43:43,755 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:43:57,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3358380.0, ans=0.0 2023-11-28 04:44:03,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3358446.6666666665, ans=0.125 2023-11-28 04:44:05,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3358446.6666666665, ans=0.125 2023-11-28 04:44:13,537 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10800, loss[loss=0.08154, simple_loss=0.1119, pruned_loss=0.01803, audio_tagging_loss=0.007588, over 15767.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0911, pruned_loss=0.01261, audio_tagging_loss=0.00851, over 3054749.29 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:44:30,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.815e+01 9.428e+01 9.959e+01 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 04:44:37,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3358646.6666666665, ans=0.125 2023-11-28 04:44:38,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-28 04:44:38,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3358646.6666666665, ans=0.1 2023-11-28 04:44:41,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-11-28 04:44:49,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3358713.3333333335, ans=0.0 2023-11-28 04:44:58,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3358713.3333333335, ans=0.2 2023-11-28 04:45:12,744 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10850, loss[loss=0.066, simple_loss=0.08336, pruned_loss=0.01387, audio_tagging_loss=0.01045, over 15423.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09021, pruned_loss=0.01248, audio_tagging_loss=0.00866, over 3055855.81 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:45:17,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3358846.6666666665, ans=0.125 2023-11-28 04:45:24,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3358913.3333333335, ans=0.2 2023-11-28 04:45:35,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=22.5 2023-11-28 04:45:36,405 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-28 04:46:03,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3359113.3333333335, ans=0.0 2023-11-28 04:46:04,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3359113.3333333335, ans=0.125 2023-11-28 04:46:09,853 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10900, loss[loss=0.07669, simple_loss=0.1062, pruned_loss=0.01673, audio_tagging_loss=0.006838, over 14720.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08992, pruned_loss=0.01232, audio_tagging_loss=0.008702, over 3050936.36 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:46:09,868 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:46:25,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.788e+01 9.283e+01 9.844e+01 1.254e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 04:46:34,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-28 04:46:36,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3359313.3333333335, ans=0.0 2023-11-28 04:46:40,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3359313.3333333335, ans=0.125 2023-11-28 04:47:07,423 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10950, loss[loss=0.04069, simple_loss=0.05107, pruned_loss=0.005272, audio_tagging_loss=0.009882, over 15182.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09005, pruned_loss=0.01229, audio_tagging_loss=0.008723, over 3043835.47 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:47:12,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2023-11-28 04:47:31,931 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-28 04:47:47,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-11-28 04:47:49,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3359713.3333333335, ans=0.125 2023-11-28 04:47:56,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3359780.0, ans=0.1 2023-11-28 04:47:56,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3359780.0, ans=0.125 2023-11-28 04:48:01,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3359780.0, ans=0.125 2023-11-28 04:48:05,130 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11000, loss[loss=0.067, simple_loss=0.1, pruned_loss=0.0102, audio_tagging_loss=0.006801, over 15328.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08964, pruned_loss=0.01205, audio_tagging_loss=0.008749, over 3045879.30 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:48:12,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3359846.6666666665, ans=0.2 2023-11-28 04:48:17,903 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:48:21,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.488e+01 9.034e+01 9.756e+01 1.163e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-28 04:48:26,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3359980.0, ans=0.125 2023-11-28 04:48:29,537 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-28 04:48:35,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3359980.0, ans=0.0 2023-11-28 04:48:35,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3359980.0, ans=0.2 2023-11-28 04:48:44,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3360046.6666666665, ans=0.125 2023-11-28 04:48:53,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3360113.3333333335, ans=0.0 2023-11-28 04:48:59,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3360113.3333333335, ans=0.0 2023-11-28 04:49:02,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360113.3333333335, ans=0.1 2023-11-28 04:49:05,274 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11050, loss[loss=0.06653, simple_loss=0.09317, pruned_loss=0.01016, audio_tagging_loss=0.009785, over 15353.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09069, pruned_loss=0.01231, audio_tagging_loss=0.008886, over 3053369.69 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:49:06,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-28 04:49:07,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3360180.0, ans=0.1 2023-11-28 04:49:16,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-28 04:49:28,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-28 04:49:37,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3360313.3333333335, ans=0.0 2023-11-28 04:49:53,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3360446.6666666665, ans=0.5 2023-11-28 04:50:02,375 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11100, loss[loss=0.09823, simple_loss=0.1409, pruned_loss=0.01956, audio_tagging_loss=0.008206, over 16227.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08966, pruned_loss=0.01213, audio_tagging_loss=0.008982, over 3061415.60 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:50:03,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3360513.3333333335, ans=0.125 2023-11-28 04:50:12,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3360580.0, ans=0.0 2023-11-28 04:50:18,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.723e+01 9.489e+01 1.017e+02 2.061e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 04:50:25,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-11-28 04:50:26,306 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-28 04:50:59,699 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11150, loss[loss=0.06044, simple_loss=0.08854, pruned_loss=0.008506, audio_tagging_loss=0.007662, over 14512.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08908, pruned_loss=0.01197, audio_tagging_loss=0.009154, over 3053222.77 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:51:20,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3360913.3333333335, ans=0.04949747468305833 2023-11-28 04:51:23,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-28 04:51:25,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3360980.0, ans=0.125 2023-11-28 04:51:35,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=22.5 2023-11-28 04:51:44,799 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:51:45,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-28 04:51:47,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3361113.3333333335, ans=0.125 2023-11-28 04:51:51,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2023-11-28 04:51:57,689 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11200, loss[loss=0.05396, simple_loss=0.07542, pruned_loss=0.009459, audio_tagging_loss=0.00679, over 14309.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08859, pruned_loss=0.0119, audio_tagging_loss=0.009269, over 3055340.83 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:51:57,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3361180.0, ans=0.2 2023-11-28 04:51:59,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361180.0, ans=0.1 2023-11-28 04:52:13,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.826e+01 9.324e+01 1.011e+02 1.372e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 04:52:15,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=10.0 2023-11-28 04:52:17,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2023-11-28 04:52:21,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-28 04:52:38,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3361380.0, ans=0.125 2023-11-28 04:52:42,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3361446.6666666665, ans=0.125 2023-11-28 04:52:44,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3361446.6666666665, ans=0.1 2023-11-28 04:52:55,503 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11250, loss[loss=0.07938, simple_loss=0.1009, pruned_loss=0.01748, audio_tagging_loss=0.01144, over 15868.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0881, pruned_loss=0.01189, audio_tagging_loss=0.009307, over 3052982.64 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:53:01,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3361513.3333333335, ans=0.125 2023-11-28 04:53:19,160 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-28 04:53:33,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3361713.3333333335, ans=0.125 2023-11-28 04:53:52,337 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11300, loss[loss=0.07011, simple_loss=0.1034, pruned_loss=0.01194, audio_tagging_loss=0.006455, over 15258.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08865, pruned_loss=0.01201, audio_tagging_loss=0.009048, over 3048336.90 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:53:55,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-11-28 04:54:04,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3361913.3333333335, ans=0.125 2023-11-28 04:54:09,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.810e+01 9.312e+01 1.008e+02 1.209e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 04:54:16,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-28 04:54:36,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3362046.6666666665, ans=0.0 2023-11-28 04:54:43,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-28 04:54:44,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3362113.3333333335, ans=0.07 2023-11-28 04:54:50,070 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11350, loss[loss=0.07613, simple_loss=0.1069, pruned_loss=0.01477, audio_tagging_loss=0.007901, over 16058.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08857, pruned_loss=0.01204, audio_tagging_loss=0.008855, over 3046734.61 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:54:57,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3362180.0, ans=0.0 2023-11-28 04:55:01,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3362246.6666666665, ans=15.0 2023-11-28 04:55:06,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2023-11-28 04:55:09,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-11-28 04:55:14,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-28 04:55:24,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-28 04:55:48,170 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11400, loss[loss=0.04702, simple_loss=0.06392, pruned_loss=0.008472, audio_tagging_loss=0.006594, over 16490.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08833, pruned_loss=0.01202, audio_tagging_loss=0.00881, over 3052282.23 frames. ], batch size: 67, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:55:50,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3362513.3333333335, ans=0.125 2023-11-28 04:55:51,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3362513.3333333335, ans=0.0 2023-11-28 04:56:01,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3362580.0, ans=0.125 2023-11-28 04:56:05,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.951e+01 9.530e+01 1.041e+02 1.873e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-28 04:56:05,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3362580.0, ans=0.125 2023-11-28 04:56:08,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3362580.0, ans=0.0 2023-11-28 04:56:12,141 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-28 04:56:22,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=12.0 2023-11-28 04:56:45,791 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11450, loss[loss=0.06088, simple_loss=0.08008, pruned_loss=0.0132, audio_tagging_loss=0.007645, over 15681.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08875, pruned_loss=0.01225, audio_tagging_loss=0.008808, over 3052092.84 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:56:53,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3362846.6666666665, ans=0.125 2023-11-28 04:57:07,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3362980.0, ans=0.125 2023-11-28 04:57:09,830 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-28 04:57:25,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3363046.6666666665, ans=0.0 2023-11-28 04:57:33,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3363113.3333333335, ans=0.125 2023-11-28 04:57:37,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3363113.3333333335, ans=0.025 2023-11-28 04:57:43,781 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11500, loss[loss=0.05715, simple_loss=0.0797, pruned_loss=0.009013, audio_tagging_loss=0.008284, over 16038.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0891, pruned_loss=0.01226, audio_tagging_loss=0.008809, over 3046562.88 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:57:52,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3363180.0, ans=0.0 2023-11-28 04:58:02,620 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.810e+01 9.465e+01 1.017e+02 1.248e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 04:58:08,092 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-28 04:58:18,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3363380.0, ans=0.1 2023-11-28 04:58:23,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-11-28 04:58:34,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3363446.6666666665, ans=0.0 2023-11-28 04:58:40,739 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11550, loss[loss=0.05864, simple_loss=0.07796, pruned_loss=0.009928, audio_tagging_loss=0.009734, over 15221.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08985, pruned_loss=0.01229, audio_tagging_loss=0.008739, over 3052588.75 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:58:57,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=22.5 2023-11-28 04:59:05,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-28 04:59:12,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3363646.6666666665, ans=0.125 2023-11-28 04:59:19,000 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:59:38,806 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11600, loss[loss=0.06586, simple_loss=0.08793, pruned_loss=0.01421, audio_tagging_loss=0.007685, over 14906.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09051, pruned_loss=0.01234, audio_tagging_loss=0.008744, over 3057754.53 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:59:42,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3363846.6666666665, ans=0.125 2023-11-28 04:59:57,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.620e+01 9.416e+01 1.017e+02 1.407e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:00:02,709 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-28 05:00:08,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3363980.0, ans=0.0 2023-11-28 05:00:33,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3364113.3333333335, ans=0.125 2023-11-28 05:00:36,727 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11650, loss[loss=0.07463, simple_loss=0.09909, pruned_loss=0.01588, audio_tagging_loss=0.009206, over 14734.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09078, pruned_loss=0.01239, audio_tagging_loss=0.00875, over 3052913.40 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:00:43,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3364180.0, ans=0.125 2023-11-28 05:00:46,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3364246.6666666665, ans=0.2 2023-11-28 05:00:50,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3364246.6666666665, ans=6.0 2023-11-28 05:00:59,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-28 05:01:01,216 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-28 05:01:09,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-11-28 05:01:11,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3364380.0, ans=0.0 2023-11-28 05:01:33,592 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11700, loss[loss=0.06881, simple_loss=0.08377, pruned_loss=0.01465, audio_tagging_loss=0.01228, over 15993.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08987, pruned_loss=0.01236, audio_tagging_loss=0.008759, over 3046656.58 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:01:37,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-28 05:01:42,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3364513.3333333335, ans=0.0 2023-11-28 05:01:52,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.810e+01 9.366e+01 1.007e+02 1.398e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 05:01:58,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-28 05:02:00,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3364646.6666666665, ans=0.125 2023-11-28 05:02:30,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3364846.6666666665, ans=0.125 2023-11-28 05:02:31,532 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11750, loss[loss=0.09761, simple_loss=0.1332, pruned_loss=0.02334, audio_tagging_loss=0.007695, over 14387.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08941, pruned_loss=0.01228, audio_tagging_loss=0.008791, over 3039139.62 frames. ], batch size: 52, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:02:36,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3364846.6666666665, ans=0.125 2023-11-28 05:02:39,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.54 vs. limit=22.5 2023-11-28 05:02:41,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3364846.6666666665, ans=0.2 2023-11-28 05:02:50,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-28 05:02:55,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-28 05:02:55,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3364980.0, ans=0.02 2023-11-28 05:03:07,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.92 vs. limit=10.0 2023-11-28 05:03:18,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3365113.3333333335, ans=0.0 2023-11-28 05:03:29,532 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11800, loss[loss=0.07215, simple_loss=0.09752, pruned_loss=0.01389, audio_tagging_loss=0.009501, over 14215.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08904, pruned_loss=0.01224, audio_tagging_loss=0.00881, over 3037967.48 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:03:30,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2023-11-28 05:03:39,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3365246.6666666665, ans=0.0 2023-11-28 05:03:43,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3365246.6666666665, ans=0.1 2023-11-28 05:03:47,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.611e+01 9.542e+01 1.045e+02 1.429e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 05:03:47,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3365246.6666666665, ans=0.0 2023-11-28 05:03:49,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3365246.6666666665, ans=0.125 2023-11-28 05:03:53,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-28 05:04:13,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3365380.0, ans=0.09899494936611666 2023-11-28 05:04:21,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3365446.6666666665, ans=0.07 2023-11-28 05:04:23,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365446.6666666665, ans=0.1 2023-11-28 05:04:26,612 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11850, loss[loss=0.05721, simple_loss=0.0742, pruned_loss=0.01056, audio_tagging_loss=0.009552, over 15309.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08887, pruned_loss=0.0122, audio_tagging_loss=0.008859, over 3051625.54 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:04:35,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3365513.3333333335, ans=0.125 2023-11-28 05:04:35,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3365513.3333333335, ans=0.125 2023-11-28 05:04:51,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-28 05:04:51,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3365646.6666666665, ans=0.125 2023-11-28 05:05:04,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-28 05:05:13,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3365780.0, ans=0.125 2023-11-28 05:05:13,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2023-11-28 05:05:19,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365780.0, ans=0.1 2023-11-28 05:05:24,489 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11900, loss[loss=0.06323, simple_loss=0.07585, pruned_loss=0.01618, audio_tagging_loss=0.00912, over 15188.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08793, pruned_loss=0.012, audio_tagging_loss=0.008924, over 3048305.89 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:05:24,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3365846.6666666665, ans=0.0 2023-11-28 05:05:43,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.747e+01 9.488e+01 1.023e+02 1.658e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 05:05:44,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3365913.3333333335, ans=0.2 2023-11-28 05:05:49,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-28 05:05:55,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3365980.0, ans=0.125 2023-11-28 05:06:03,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3366046.6666666665, ans=0.125 2023-11-28 05:06:03,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3366046.6666666665, ans=0.04949747468305833 2023-11-28 05:06:23,039 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11950, loss[loss=0.06798, simple_loss=0.09628, pruned_loss=0.01205, audio_tagging_loss=0.007792, over 16176.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08882, pruned_loss=0.01223, audio_tagging_loss=0.008999, over 3043054.95 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:06:24,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3366180.0, ans=0.02 2023-11-28 05:06:28,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-28 05:06:29,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3366180.0, ans=0.1 2023-11-28 05:06:37,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3366246.6666666665, ans=0.125 2023-11-28 05:06:39,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3366246.6666666665, ans=0.2 2023-11-28 05:06:43,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3366246.6666666665, ans=0.125 2023-11-28 05:06:46,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-28 05:07:16,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3366446.6666666665, ans=0.2 2023-11-28 05:07:19,251 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 12000, loss[loss=0.05835, simple_loss=0.06821, pruned_loss=0.01132, audio_tagging_loss=0.01293, over 14990.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08852, pruned_loss=0.01218, audio_tagging_loss=0.009131, over 3044203.38 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 05:07:19,251 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 05:07:38,086 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2022, 3.9906, 3.7247, 3.2500], device='cuda:3') 2023-11-28 05:07:54,224 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05822, simple_loss=0.05066, pruned_loss=0.005316, audio_tagging_loss=0.02757, over 4681554.00 frames. 2023-11-28 05:07:54,225 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 05:08:03,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3366513.3333333335, ans=10.0 2023-11-28 05:08:08,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3366580.0, ans=0.0 2023-11-28 05:08:11,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.775e+01 9.473e+01 1.010e+02 1.187e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 05:08:16,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-28 05:08:35,697 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 0, loss[loss=0.08567, simple_loss=0.1028, pruned_loss=0.01597, audio_tagging_loss=0.0183, over 16043.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1028, pruned_loss=0.01597, audio_tagging_loss=0.0183, over 16043.00 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:08:35,698 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 05:08:50,539 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.4587, 6.2496, 6.0002, 6.0138], device='cuda:3') 2023-11-28 05:09:10,065 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05773, simple_loss=0.0506, pruned_loss=0.005225, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-28 05:09:10,066 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 05:09:12,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-28 05:09:25,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3366740.0, ans=0.125 2023-11-28 05:09:27,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3366740.0, ans=0.125 2023-11-28 05:09:41,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3366806.6666666665, ans=0.125 2023-11-28 05:09:49,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3366873.3333333335, ans=0.125 2023-11-28 05:10:04,078 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-28 05:10:07,276 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 50, loss[loss=0.07834, simple_loss=0.08935, pruned_loss=0.01527, audio_tagging_loss=0.0184, over 14665.00 frames. ], tot_loss[loss=0.07388, simple_loss=0.08971, pruned_loss=0.01237, audio_tagging_loss=0.01665, over 680109.46 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:10:20,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3367073.3333333335, ans=0.125 2023-11-28 05:10:24,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3367073.3333333335, ans=0.125 2023-11-28 05:10:26,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367073.3333333335, ans=0.1 2023-11-28 05:10:41,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3367206.6666666665, ans=0.125 2023-11-28 05:10:45,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3367206.6666666665, ans=0.0 2023-11-28 05:10:56,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 9.586e+01 1.037e+02 1.129e+02 1.417e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-28 05:11:01,172 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-28 05:11:04,363 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 100, loss[loss=0.06265, simple_loss=0.07118, pruned_loss=0.0102, audio_tagging_loss=0.01686, over 15608.00 frames. ], tot_loss[loss=0.07417, simple_loss=0.09085, pruned_loss=0.01287, audio_tagging_loss=0.01588, over 1204415.81 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:11:11,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3367340.0, ans=0.5 2023-11-28 05:11:13,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2023-11-28 05:11:22,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3367406.6666666665, ans=0.125 2023-11-28 05:11:23,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-28 05:11:24,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3367406.6666666665, ans=0.125 2023-11-28 05:11:38,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3367540.0, ans=0.95 2023-11-28 05:11:39,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3367540.0, ans=0.05 2023-11-28 05:11:48,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-28 05:11:51,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-28 05:11:57,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3367606.6666666665, ans=0.2 2023-11-28 05:11:58,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-28 05:12:01,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-28 05:12:02,497 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 150, loss[loss=0.08039, simple_loss=0.1107, pruned_loss=0.01339, audio_tagging_loss=0.01166, over 15834.00 frames. ], tot_loss[loss=0.07277, simple_loss=0.09122, pruned_loss=0.01292, audio_tagging_loss=0.01424, over 1610321.16 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:12:16,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-28 05:12:18,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3367740.0, ans=0.035 2023-11-28 05:12:19,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367740.0, ans=0.1 2023-11-28 05:12:41,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3367873.3333333335, ans=0.0 2023-11-28 05:12:49,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3367940.0, ans=0.2 2023-11-28 05:12:50,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3367940.0, ans=0.0 2023-11-28 05:12:52,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 9.055e+01 9.611e+01 1.032e+02 1.243e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 05:12:54,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3367940.0, ans=0.125 2023-11-28 05:12:57,233 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-28 05:13:01,149 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 200, loss[loss=0.06917, simple_loss=0.09409, pruned_loss=0.007864, audio_tagging_loss=0.01426, over 15794.00 frames. ], tot_loss[loss=0.07097, simple_loss=0.09126, pruned_loss=0.01262, audio_tagging_loss=0.01272, over 1930012.25 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:13:21,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3368073.3333333335, ans=0.2 2023-11-28 05:13:39,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368206.6666666665, ans=0.1 2023-11-28 05:13:54,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-28 05:13:57,686 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 250, loss[loss=0.07161, simple_loss=0.1006, pruned_loss=0.01221, audio_tagging_loss=0.009108, over 15469.00 frames. ], tot_loss[loss=0.07046, simple_loss=0.09268, pruned_loss=0.01264, audio_tagging_loss=0.01148, over 2183872.24 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:14:05,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3368340.0, ans=0.125 2023-11-28 05:14:22,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3368473.3333333335, ans=0.5 2023-11-28 05:14:23,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-28 05:14:23,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-28 05:14:32,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3368540.0, ans=0.125 2023-11-28 05:14:48,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 9.024e+01 9.625e+01 1.027e+02 1.223e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 05:14:51,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-28 05:14:55,484 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 300, loss[loss=0.06633, simple_loss=0.09156, pruned_loss=0.01122, audio_tagging_loss=0.009331, over 15169.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09299, pruned_loss=0.01276, audio_tagging_loss=0.01055, over 2379149.51 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:14:58,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3368673.3333333335, ans=0.125 2023-11-28 05:15:02,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368673.3333333335, ans=0.1 2023-11-28 05:15:07,515 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:15:18,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2023-11-28 05:15:21,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3368806.6666666665, ans=0.0 2023-11-28 05:15:33,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3368873.3333333335, ans=0.0 2023-11-28 05:15:49,203 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-28 05:15:52,952 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 350, loss[loss=0.05425, simple_loss=0.07315, pruned_loss=0.005972, audio_tagging_loss=0.01171, over 15644.00 frames. ], tot_loss[loss=0.06905, simple_loss=0.09265, pruned_loss=0.01271, audio_tagging_loss=0.01002, over 2530295.44 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:16:00,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3369006.6666666665, ans=0.1 2023-11-28 05:16:25,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3369206.6666666665, ans=0.125 2023-11-28 05:16:33,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3369206.6666666665, ans=0.0 2023-11-28 05:16:42,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.904e+01 9.500e+01 1.023e+02 1.547e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:16:46,286 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-28 05:16:49,841 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 400, loss[loss=0.08505, simple_loss=0.117, pruned_loss=0.01933, audio_tagging_loss=0.007232, over 15961.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09153, pruned_loss=0.01268, audio_tagging_loss=0.009705, over 2645683.70 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:16:50,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-28 05:17:03,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3369406.6666666665, ans=0.1 2023-11-28 05:17:33,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3369540.0, ans=0.1 2023-11-28 05:17:41,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3369606.6666666665, ans=0.125 2023-11-28 05:17:43,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-28 05:17:46,361 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 450, loss[loss=0.06248, simple_loss=0.08658, pruned_loss=0.01158, audio_tagging_loss=0.00761, over 13965.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09123, pruned_loss=0.01249, audio_tagging_loss=0.009484, over 2728498.72 frames. ], batch size: 52, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:18:15,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3369806.6666666665, ans=0.125 2023-11-28 05:18:18,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2023-11-28 05:18:26,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.49 vs. limit=6.0 2023-11-28 05:18:37,193 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.667e+01 9.242e+01 1.003e+02 1.378e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 05:18:37,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3369940.0, ans=0.125 2023-11-28 05:18:41,071 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-28 05:18:43,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370006.6666666665, ans=0.1 2023-11-28 05:18:44,341 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 500, loss[loss=0.06565, simple_loss=0.08682, pruned_loss=0.01113, audio_tagging_loss=0.01111, over 15331.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09065, pruned_loss=0.01253, audio_tagging_loss=0.009386, over 2796830.76 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:18:45,643 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:19:00,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3370073.3333333335, ans=0.125 2023-11-28 05:19:08,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3370140.0, ans=0.0 2023-11-28 05:19:38,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-28 05:19:41,715 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 550, loss[loss=0.05574, simple_loss=0.06355, pruned_loss=0.01351, audio_tagging_loss=0.01046, over 14136.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09065, pruned_loss=0.01259, audio_tagging_loss=0.009251, over 2842836.46 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:19:52,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-28 05:19:53,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3370406.6666666665, ans=0.1 2023-11-28 05:20:30,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:31,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:32,175 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 9.076e+01 9.606e+01 1.009e+02 1.464e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 05:20:36,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-28 05:20:39,651 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 600, loss[loss=0.08506, simple_loss=0.1201, pruned_loss=0.01622, audio_tagging_loss=0.008811, over 16267.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09062, pruned_loss=0.01257, audio_tagging_loss=0.009142, over 2884751.41 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:20:43,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3370673.3333333335, ans=0.2 2023-11-28 05:21:01,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3370806.6666666665, ans=0.035 2023-11-28 05:21:09,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3370806.6666666665, ans=0.125 2023-11-28 05:21:13,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3370873.3333333335, ans=0.125 2023-11-28 05:21:19,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370873.3333333335, ans=0.1 2023-11-28 05:21:24,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370940.0, ans=0.1 2023-11-28 05:21:30,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3370940.0, ans=0.2 2023-11-28 05:21:34,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-28 05:21:37,651 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 650, loss[loss=0.07929, simple_loss=0.1005, pruned_loss=0.01864, audio_tagging_loss=0.01039, over 15739.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09078, pruned_loss=0.01271, audio_tagging_loss=0.008991, over 2925383.05 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:21:37,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3371006.6666666665, ans=0.1 2023-11-28 05:21:38,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3371006.6666666665, ans=0.0 2023-11-28 05:21:40,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-28 05:21:54,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3371073.3333333335, ans=0.2 2023-11-28 05:21:58,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3371073.3333333335, ans=0.2 2023-11-28 05:22:00,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3371140.0, ans=0.125 2023-11-28 05:22:28,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.720e+01 9.285e+01 9.863e+01 1.198e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:22:31,970 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-28 05:22:35,217 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 700, loss[loss=0.07136, simple_loss=0.1064, pruned_loss=0.01009, audio_tagging_loss=0.008069, over 15263.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09015, pruned_loss=0.01253, audio_tagging_loss=0.008913, over 2964426.05 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:22:35,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3371340.0, ans=0.125 2023-11-28 05:22:39,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3371340.0, ans=0.125 2023-11-28 05:22:44,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3371340.0, ans=0.0 2023-11-28 05:22:46,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3371406.6666666665, ans=0.125 2023-11-28 05:23:16,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3371540.0, ans=0.2 2023-11-28 05:23:30,001 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-28 05:23:32,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3371673.3333333335, ans=0.2 2023-11-28 05:23:33,237 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 750, loss[loss=0.04005, simple_loss=0.03616, pruned_loss=0.00842, audio_tagging_loss=0.01355, over 14548.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08995, pruned_loss=0.01258, audio_tagging_loss=0.00889, over 2985999.42 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:23:33,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3371673.3333333335, ans=0.125 2023-11-28 05:23:53,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3371740.0, ans=0.09899494936611666 2023-11-28 05:23:59,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-28 05:23:59,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-28 05:23:59,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3371806.6666666665, ans=0.1 2023-11-28 05:24:04,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-28 05:24:06,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-28 05:24:15,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-28 05:24:24,151 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.959e+01 9.414e+01 9.993e+01 1.273e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:24:27,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-28 05:24:31,346 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 800, loss[loss=0.07336, simple_loss=0.09853, pruned_loss=0.0143, audio_tagging_loss=0.009795, over 15927.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09058, pruned_loss=0.01281, audio_tagging_loss=0.00892, over 3002449.32 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:24:33,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3372006.6666666665, ans=0.0 2023-11-28 05:24:38,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3372006.6666666665, ans=0.0 2023-11-28 05:24:40,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-28 05:24:43,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3372073.3333333335, ans=0.0 2023-11-28 05:24:47,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-28 05:25:05,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3372206.6666666665, ans=0.125 2023-11-28 05:25:24,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2023-11-28 05:25:24,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-28 05:25:24,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3372273.3333333335, ans=0.2 2023-11-28 05:25:28,110 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 850, loss[loss=0.07376, simple_loss=0.1016, pruned_loss=0.01558, audio_tagging_loss=0.007372, over 14726.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09063, pruned_loss=0.01271, audio_tagging_loss=0.009019, over 3011303.05 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:25:48,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3372406.6666666665, ans=0.125 2023-11-28 05:25:59,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3372473.3333333335, ans=0.0 2023-11-28 05:26:05,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372540.0, ans=0.1 2023-11-28 05:26:18,491 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.784e+01 9.411e+01 9.995e+01 2.932e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 05:26:21,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-28 05:26:26,159 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 900, loss[loss=0.06437, simple_loss=0.07799, pruned_loss=0.01462, audio_tagging_loss=0.01076, over 14894.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09125, pruned_loss=0.01291, audio_tagging_loss=0.009091, over 3017970.27 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:26:27,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3372673.3333333335, ans=0.07 2023-11-28 05:26:28,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3372673.3333333335, ans=0.125 2023-11-28 05:26:29,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-28 05:26:31,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-28 05:26:41,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2023-11-28 05:27:01,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3372873.3333333335, ans=0.125 2023-11-28 05:27:15,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-11-28 05:27:19,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-28 05:27:23,365 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 950, loss[loss=0.06305, simple_loss=0.07688, pruned_loss=0.01139, audio_tagging_loss=0.01321, over 14307.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.0915, pruned_loss=0.01286, audio_tagging_loss=0.009033, over 3025442.07 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:27:24,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3373006.6666666665, ans=0.0 2023-11-28 05:27:37,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3373073.3333333335, ans=0.025 2023-11-28 05:27:45,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-28 05:27:48,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3373140.0, ans=0.0 2023-11-28 05:27:51,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3373140.0, ans=0.125 2023-11-28 05:28:14,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.684e+01 9.471e+01 1.027e+02 1.244e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 05:28:17,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-28 05:28:21,347 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1000, loss[loss=0.06242, simple_loss=0.0869, pruned_loss=0.01159, audio_tagging_loss=0.007383, over 14918.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09095, pruned_loss=0.01285, audio_tagging_loss=0.008901, over 3027964.42 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:28:21,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3373340.0, ans=0.125 2023-11-28 05:28:28,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3373340.0, ans=0.125 2023-11-28 05:28:47,908 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:28:59,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3373540.0, ans=10.0 2023-11-28 05:29:06,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3373606.6666666665, ans=0.125 2023-11-28 05:29:14,892 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-28 05:29:18,175 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1050, loss[loss=0.06402, simple_loss=0.09293, pruned_loss=0.0104, audio_tagging_loss=0.007151, over 16490.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0899, pruned_loss=0.01267, audio_tagging_loss=0.00874, over 3028290.99 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:29:36,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=22.5 2023-11-28 05:29:36,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3373740.0, ans=0.0 2023-11-28 05:29:52,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3373873.3333333335, ans=0.1 2023-11-28 05:30:09,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.822e+01 9.430e+01 1.008e+02 1.221e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:30:12,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2023-11-28 05:30:13,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-28 05:30:16,826 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1100, loss[loss=0.05614, simple_loss=0.07506, pruned_loss=0.009804, audio_tagging_loss=0.00881, over 14093.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08978, pruned_loss=0.01251, audio_tagging_loss=0.008633, over 3029940.51 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:30:19,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-28 05:30:21,721 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:30:26,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3374006.6666666665, ans=0.0 2023-11-28 05:30:27,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3374073.3333333335, ans=0.0 2023-11-28 05:30:50,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3374206.6666666665, ans=0.125 2023-11-28 05:31:00,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-28 05:31:11,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-28 05:31:14,620 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1150, loss[loss=0.0587, simple_loss=0.07655, pruned_loss=0.01008, audio_tagging_loss=0.01034, over 13920.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.089, pruned_loss=0.01237, audio_tagging_loss=0.008638, over 3026948.75 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:31:20,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3374340.0, ans=0.125 2023-11-28 05:31:30,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=22.5 2023-11-28 05:31:35,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3374406.6666666665, ans=0.125 2023-11-28 05:31:54,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-11-28 05:31:56,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3374540.0, ans=0.0 2023-11-28 05:31:56,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-28 05:32:06,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.704e+01 9.429e+01 9.950e+01 1.461e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:32:08,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-28 05:32:11,929 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1200, loss[loss=0.06142, simple_loss=0.08481, pruned_loss=0.01059, audio_tagging_loss=0.008419, over 14280.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08918, pruned_loss=0.01245, audio_tagging_loss=0.008573, over 3027619.38 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:32:21,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3374673.3333333335, ans=0.125 2023-11-28 05:32:32,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3374740.0, ans=0.125 2023-11-28 05:32:41,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-28 05:32:53,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3374873.3333333335, ans=0.5 2023-11-28 05:33:05,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-28 05:33:09,515 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1250, loss[loss=0.0563, simple_loss=0.0702, pruned_loss=0.009819, audio_tagging_loss=0.01138, over 14643.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08892, pruned_loss=0.01231, audio_tagging_loss=0.008566, over 3040985.25 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:33:11,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3375006.6666666665, ans=0.125 2023-11-28 05:33:21,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-28 05:33:25,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3375073.3333333335, ans=0.025 2023-11-28 05:33:43,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3375206.6666666665, ans=0.1 2023-11-28 05:33:51,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=12.0 2023-11-28 05:34:02,428 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.863e+01 9.431e+01 1.030e+02 1.303e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:34:04,670 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-28 05:34:07,936 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1300, loss[loss=0.06353, simple_loss=0.08518, pruned_loss=0.01299, audio_tagging_loss=0.007948, over 15955.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08844, pruned_loss=0.01216, audio_tagging_loss=0.00856, over 3033316.27 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:34:13,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3375340.0, ans=0.125 2023-11-28 05:34:14,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2023-11-28 05:34:20,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2023-11-28 05:34:25,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2023-11-28 05:34:47,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3375540.0, ans=0.125 2023-11-28 05:34:48,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-28 05:34:58,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3375606.6666666665, ans=0.0 2023-11-28 05:35:01,561 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-28 05:35:04,839 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1350, loss[loss=0.06518, simple_loss=0.08151, pruned_loss=0.01546, audio_tagging_loss=0.008961, over 15595.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08888, pruned_loss=0.01226, audio_tagging_loss=0.008563, over 3038671.42 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:35:13,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-28 05:35:17,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3375740.0, ans=0.125 2023-11-28 05:35:35,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3375806.6666666665, ans=0.125 2023-11-28 05:35:48,619 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:35:52,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3375940.0, ans=0.125 2023-11-28 05:35:52,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3375940.0, ans=0.07 2023-11-28 05:35:57,893 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.640e+01 9.329e+01 1.009e+02 1.189e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:35:59,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-28 05:36:02,720 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1400, loss[loss=0.06941, simple_loss=0.08685, pruned_loss=0.01211, audio_tagging_loss=0.01388, over 15217.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08884, pruned_loss=0.0122, audio_tagging_loss=0.008727, over 3041022.74 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:36:07,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3376006.6666666665, ans=0.125 2023-11-28 05:36:16,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3376073.3333333335, ans=0.0 2023-11-28 05:36:16,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3376073.3333333335, ans=0.125 2023-11-28 05:36:24,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3376073.3333333335, ans=0.0 2023-11-28 05:36:29,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3376140.0, ans=0.0 2023-11-28 05:36:31,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3376140.0, ans=0.0 2023-11-28 05:36:37,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3376206.6666666665, ans=0.0 2023-11-28 05:36:57,736 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-28 05:37:01,513 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1450, loss[loss=0.06818, simple_loss=0.09589, pruned_loss=0.01109, audio_tagging_loss=0.009143, over 14885.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08903, pruned_loss=0.01219, audio_tagging_loss=0.008742, over 3039161.33 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:37:13,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376406.6666666665, ans=0.1 2023-11-28 05:37:18,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3376406.6666666665, ans=0.125 2023-11-28 05:37:53,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.627e+01 9.329e+01 1.021e+02 1.483e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:37:54,897 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-28 05:37:58,154 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1500, loss[loss=0.05114, simple_loss=0.0639, pruned_loss=0.007976, audio_tagging_loss=0.01122, over 14989.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08998, pruned_loss=0.01252, audio_tagging_loss=0.008846, over 3038438.66 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:38:02,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3376673.3333333335, ans=0.0 2023-11-28 05:38:10,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3376740.0, ans=0.0 2023-11-28 05:38:26,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3376806.6666666665, ans=0.1 2023-11-28 05:38:47,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3376940.0, ans=0.07 2023-11-28 05:38:49,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3376940.0, ans=0.125 2023-11-28 05:38:52,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-28 05:38:56,146 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1550, loss[loss=0.07298, simple_loss=0.09874, pruned_loss=0.01551, audio_tagging_loss=0.008097, over 15080.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09051, pruned_loss=0.01265, audio_tagging_loss=0.00888, over 3035975.22 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:39:12,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2023-11-28 05:39:23,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3377140.0, ans=0.0 2023-11-28 05:39:24,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3377140.0, ans=0.1 2023-11-28 05:39:25,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-28 05:39:28,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3377140.0, ans=0.0 2023-11-28 05:39:49,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 9.028e+01 9.506e+01 1.021e+02 1.396e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 05:39:51,050 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-28 05:39:54,699 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1600, loss[loss=0.06528, simple_loss=0.09156, pruned_loss=0.01097, audio_tagging_loss=0.008527, over 14299.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09037, pruned_loss=0.01252, audio_tagging_loss=0.008943, over 3046127.55 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:39:58,155 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:40:03,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3377340.0, ans=0.125 2023-11-28 05:40:04,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2023-11-28 05:40:22,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3377473.3333333335, ans=0.125 2023-11-28 05:40:47,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-11-28 05:40:49,089 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-28 05:40:52,364 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1650, loss[loss=0.07569, simple_loss=0.1051, pruned_loss=0.01279, audio_tagging_loss=0.01034, over 14367.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09091, pruned_loss=0.0127, audio_tagging_loss=0.008867, over 3046525.23 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:40:54,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3377673.3333333335, ans=0.125 2023-11-28 05:41:14,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3377806.6666666665, ans=0.125 2023-11-28 05:41:19,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-11-28 05:41:25,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3377806.6666666665, ans=0.125 2023-11-28 05:41:29,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-28 05:41:43,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-28 05:41:46,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.798e+01 9.580e+01 1.024e+02 1.381e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 05:41:46,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-28 05:41:50,484 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1700, loss[loss=0.07923, simple_loss=0.1137, pruned_loss=0.01678, audio_tagging_loss=0.00561, over 14745.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09009, pruned_loss=0.01248, audio_tagging_loss=0.008999, over 3051260.25 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:41:53,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3378006.6666666665, ans=0.0 2023-11-28 05:42:10,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3378073.3333333335, ans=0.0 2023-11-28 05:42:44,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-28 05:42:45,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=12.0 2023-11-28 05:42:48,352 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1750, loss[loss=0.08812, simple_loss=0.114, pruned_loss=0.02125, audio_tagging_loss=0.009847, over 14410.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09005, pruned_loss=0.01245, audio_tagging_loss=0.008991, over 3050189.95 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:42:48,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3378340.0, ans=0.125 2023-11-28 05:42:51,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3378340.0, ans=0.1 2023-11-28 05:43:16,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3378473.3333333335, ans=0.125 2023-11-28 05:43:23,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3378540.0, ans=0.1 2023-11-28 05:43:27,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2023-11-28 05:43:41,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.622e+01 9.287e+01 9.848e+01 1.344e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:43:42,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-28 05:43:45,540 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1800, loss[loss=0.05621, simple_loss=0.0716, pruned_loss=0.01039, audio_tagging_loss=0.01003, over 14805.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08961, pruned_loss=0.01243, audio_tagging_loss=0.00895, over 3055123.54 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:43:47,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-28 05:43:48,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-28 05:43:55,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3378673.3333333335, ans=0.2 2023-11-28 05:44:06,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3378740.0, ans=0.0 2023-11-28 05:44:07,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3378806.6666666665, ans=0.2 2023-11-28 05:44:19,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3378873.3333333335, ans=0.125 2023-11-28 05:44:39,478 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-28 05:44:43,410 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1850, loss[loss=0.06464, simple_loss=0.08766, pruned_loss=0.01348, audio_tagging_loss=0.007332, over 14420.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08959, pruned_loss=0.01237, audio_tagging_loss=0.008804, over 3049443.24 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:44:57,784 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:45:15,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3379140.0, ans=0.0 2023-11-28 05:45:18,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-28 05:45:32,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3379273.3333333335, ans=0.0 2023-11-28 05:45:34,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2023-11-28 05:45:37,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.914e+01 9.536e+01 1.008e+02 1.259e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 05:45:37,647 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-28 05:45:37,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3379273.3333333335, ans=0.07 2023-11-28 05:45:41,363 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1900, loss[loss=0.08099, simple_loss=0.1118, pruned_loss=0.01685, audio_tagging_loss=0.008228, over 15149.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09, pruned_loss=0.01232, audio_tagging_loss=0.008798, over 3050947.79 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:45:51,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3379406.6666666665, ans=0.125 2023-11-28 05:45:56,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=12.0 2023-11-28 05:46:03,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-28 05:46:04,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2023-11-28 05:46:25,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2023-11-28 05:46:35,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-28 05:46:38,582 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1950, loss[loss=0.06513, simple_loss=0.08584, pruned_loss=0.01014, audio_tagging_loss=0.01207, over 15248.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0899, pruned_loss=0.01243, audio_tagging_loss=0.008753, over 3047101.38 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:46:44,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-11-28 05:47:22,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3379873.3333333335, ans=0.1 2023-11-28 05:47:32,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.861e+01 9.415e+01 1.012e+02 1.225e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:47:32,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-28 05:47:36,948 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2000, loss[loss=0.06778, simple_loss=0.09674, pruned_loss=0.01286, audio_tagging_loss=0.006546, over 14759.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08877, pruned_loss=0.01239, audio_tagging_loss=0.008682, over 3048365.42 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:47:46,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3380006.6666666665, ans=10.0 2023-11-28 05:47:55,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3380073.3333333335, ans=0.125 2023-11-28 05:47:59,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3380140.0, ans=0.125 2023-11-28 05:47:59,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3380140.0, ans=0.0 2023-11-28 05:48:02,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2023-11-28 05:48:12,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3380206.6666666665, ans=0.125 2023-11-28 05:48:31,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-28 05:48:34,870 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2050, loss[loss=0.05829, simple_loss=0.0781, pruned_loss=0.009393, audio_tagging_loss=0.009849, over 15854.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08897, pruned_loss=0.01233, audio_tagging_loss=0.008681, over 3043433.59 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:48:43,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3380340.0, ans=0.125 2023-11-28 05:49:03,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3380473.3333333335, ans=0.0 2023-11-28 05:49:06,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-28 05:49:29,185 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-28 05:49:30,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.113e+01 9.631e+01 1.014e+02 1.250e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 05:49:32,371 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2100, loss[loss=0.07437, simple_loss=0.09666, pruned_loss=0.01639, audio_tagging_loss=0.009643, over 15692.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08944, pruned_loss=0.0123, audio_tagging_loss=0.008631, over 3052891.72 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:49:37,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3380673.3333333335, ans=0.09899494936611666 2023-11-28 05:49:43,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3380740.0, ans=0.1 2023-11-28 05:49:45,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3380740.0, ans=0.2 2023-11-28 05:49:54,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2023-11-28 05:49:58,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=22.5 2023-11-28 05:50:12,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-28 05:50:16,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=15.0 2023-11-28 05:50:18,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3380940.0, ans=0.1 2023-11-28 05:50:19,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3380940.0, ans=0.0 2023-11-28 05:50:25,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3380940.0, ans=0.125 2023-11-28 05:50:26,441 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-28 05:50:29,603 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2150, loss[loss=0.06073, simple_loss=0.08735, pruned_loss=0.008711, audio_tagging_loss=0.008346, over 16050.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08915, pruned_loss=0.01224, audio_tagging_loss=0.008718, over 3057906.87 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:50:31,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3381006.6666666665, ans=0.125 2023-11-28 05:50:33,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3381006.6666666665, ans=0.125 2023-11-28 05:50:51,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-28 05:51:07,301 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:51:15,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-28 05:51:25,082 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-28 05:51:26,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.657e+01 9.306e+01 1.016e+02 1.700e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 05:51:28,728 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2200, loss[loss=0.07115, simple_loss=0.0932, pruned_loss=0.01518, audio_tagging_loss=0.009376, over 15217.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08949, pruned_loss=0.01234, audio_tagging_loss=0.00868, over 3054920.06 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:51:43,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3381406.6666666665, ans=0.2 2023-11-28 05:51:48,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3381406.6666666665, ans=0.125 2023-11-28 05:51:50,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3381473.3333333335, ans=0.125 2023-11-28 05:51:54,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3381473.3333333335, ans=0.125 2023-11-28 05:51:56,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3381473.3333333335, ans=0.0 2023-11-28 05:52:04,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3381540.0, ans=0.2 2023-11-28 05:52:23,779 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-28 05:52:26,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3381673.3333333335, ans=0.125 2023-11-28 05:52:27,082 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2250, loss[loss=0.06308, simple_loss=0.09103, pruned_loss=0.008917, audio_tagging_loss=0.00865, over 15556.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08984, pruned_loss=0.0123, audio_tagging_loss=0.008713, over 3042613.61 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:52:38,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3381740.0, ans=0.0 2023-11-28 05:52:50,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3381806.6666666665, ans=0.1 2023-11-28 05:53:04,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3381873.3333333335, ans=0.125 2023-11-28 05:53:13,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3381940.0, ans=0.125 2023-11-28 05:53:20,981 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-28 05:53:22,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.876e+01 9.357e+01 9.943e+01 1.279e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 05:53:24,249 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2300, loss[loss=0.07042, simple_loss=0.08314, pruned_loss=0.01675, audio_tagging_loss=0.0121, over 15326.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08912, pruned_loss=0.01226, audio_tagging_loss=0.008739, over 3041545.11 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:53:48,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3382140.0, ans=0.125 2023-11-28 05:53:49,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3382140.0, ans=0.2 2023-11-28 05:54:00,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-28 05:54:17,401 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:54:18,536 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-28 05:54:21,712 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2350, loss[loss=0.04895, simple_loss=0.06653, pruned_loss=0.005779, audio_tagging_loss=0.009901, over 16246.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08938, pruned_loss=0.01219, audio_tagging_loss=0.008769, over 3046498.87 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:54:32,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3382340.0, ans=0.2 2023-11-28 05:55:12,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3382606.6666666665, ans=0.125 2023-11-28 05:55:17,744 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-28 05:55:18,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.819e+01 9.502e+01 1.018e+02 1.349e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:55:21,476 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2400, loss[loss=0.07161, simple_loss=0.1064, pruned_loss=0.01118, audio_tagging_loss=0.007248, over 16174.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08995, pruned_loss=0.01233, audio_tagging_loss=0.008805, over 3048808.05 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:55:35,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3382740.0, ans=0.125 2023-11-28 05:55:39,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-11-28 05:55:59,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3382873.3333333335, ans=0.1 2023-11-28 05:56:08,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3382940.0, ans=0.125 2023-11-28 05:56:10,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3382940.0, ans=0.125 2023-11-28 05:56:14,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-28 05:56:17,866 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2450, loss[loss=0.08133, simple_loss=0.1095, pruned_loss=0.01783, audio_tagging_loss=0.008733, over 14864.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09071, pruned_loss=0.01251, audio_tagging_loss=0.008797, over 3041563.09 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:56:24,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3383006.6666666665, ans=0.125 2023-11-28 05:56:39,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-28 05:56:51,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3383140.0, ans=0.125 2023-11-28 05:57:12,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-28 05:57:13,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.778e+01 9.508e+01 1.025e+02 1.201e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 05:57:15,555 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2500, loss[loss=0.03919, simple_loss=0.05135, pruned_loss=0.004016, audio_tagging_loss=0.009501, over 14620.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09067, pruned_loss=0.01234, audio_tagging_loss=0.008796, over 3042852.63 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:57:20,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3383340.0, ans=0.125 2023-11-28 05:57:32,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2023-11-28 05:57:40,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3383473.3333333335, ans=0.2 2023-11-28 05:58:00,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3383606.6666666665, ans=0.1 2023-11-28 05:58:10,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-28 05:58:14,126 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2550, loss[loss=0.07472, simple_loss=0.1104, pruned_loss=0.01179, audio_tagging_loss=0.007706, over 13767.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09077, pruned_loss=0.0124, audio_tagging_loss=0.008711, over 3036280.30 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:58:30,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3383740.0, ans=0.0 2023-11-28 05:58:54,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3383873.3333333335, ans=0.025 2023-11-28 05:58:57,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3383873.3333333335, ans=0.0 2023-11-28 05:59:02,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2023-11-28 05:59:08,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-28 05:59:09,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.563e+01 9.261e+01 9.725e+01 1.208e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-28 05:59:11,673 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2600, loss[loss=0.05777, simple_loss=0.07957, pruned_loss=0.007639, audio_tagging_loss=0.01035, over 15653.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09097, pruned_loss=0.01237, audio_tagging_loss=0.008588, over 3031531.53 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:59:17,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3384006.6666666665, ans=0.125 2023-11-28 05:59:40,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3384140.0, ans=0.0 2023-11-28 05:59:47,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3384206.6666666665, ans=0.2 2023-11-28 06:00:05,554 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-28 06:00:09,387 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2650, loss[loss=0.06535, simple_loss=0.09105, pruned_loss=0.01077, audio_tagging_loss=0.009052, over 17119.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09055, pruned_loss=0.01238, audio_tagging_loss=0.008489, over 3036735.52 frames. ], batch size: 66, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:00:14,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=22.5 2023-11-28 06:00:22,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3384406.6666666665, ans=0.0 2023-11-28 06:00:35,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3384473.3333333335, ans=0.5 2023-11-28 06:01:03,458 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-28 06:01:04,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.714e+01 9.424e+01 1.027e+02 1.447e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:01:07,163 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2700, loss[loss=0.04495, simple_loss=0.04822, pruned_loss=0.006674, audio_tagging_loss=0.01417, over 14127.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09016, pruned_loss=0.01244, audio_tagging_loss=0.008555, over 3038798.30 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:01:21,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3384740.0, ans=0.125 2023-11-28 06:01:40,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3384873.3333333335, ans=0.0 2023-11-28 06:01:44,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3384873.3333333335, ans=0.125 2023-11-28 06:01:51,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3384873.3333333335, ans=0.125 2023-11-28 06:01:55,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3384940.0, ans=0.95 2023-11-28 06:01:57,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3384940.0, ans=0.0 2023-11-28 06:02:01,529 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-28 06:02:04,875 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2750, loss[loss=0.05425, simple_loss=0.07628, pruned_loss=0.007824, audio_tagging_loss=0.008293, over 15494.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08897, pruned_loss=0.01212, audio_tagging_loss=0.008625, over 3039915.58 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:02:07,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3385006.6666666665, ans=0.035 2023-11-28 06:02:07,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3385006.6666666665, ans=0.125 2023-11-28 06:02:09,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3385006.6666666665, ans=0.125 2023-11-28 06:02:11,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3385006.6666666665, ans=0.1 2023-11-28 06:02:39,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3385206.6666666665, ans=0.05 2023-11-28 06:02:41,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3385206.6666666665, ans=0.95 2023-11-28 06:02:43,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-28 06:02:57,429 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:02:58,596 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-28 06:02:59,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.812e+01 9.371e+01 1.002e+02 1.155e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 06:03:02,359 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2800, loss[loss=0.06052, simple_loss=0.083, pruned_loss=0.01166, audio_tagging_loss=0.007361, over 14377.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08847, pruned_loss=0.01204, audio_tagging_loss=0.008611, over 3040755.77 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:03:42,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 06:03:56,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-28 06:04:00,420 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2850, loss[loss=0.06874, simple_loss=0.09651, pruned_loss=0.01505, audio_tagging_loss=0.005431, over 14958.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08842, pruned_loss=0.01209, audio_tagging_loss=0.008577, over 3042554.20 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:04:00,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3385673.3333333335, ans=0.0 2023-11-28 06:04:03,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3385673.3333333335, ans=0.2 2023-11-28 06:04:05,145 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:04:10,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-11-28 06:04:23,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-28 06:04:33,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-28 06:04:35,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3385873.3333333335, ans=0.125 2023-11-28 06:04:36,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-11-28 06:04:43,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-28 06:04:53,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2023-11-28 06:04:54,043 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-28 06:04:55,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.844e+01 9.452e+01 9.978e+01 1.410e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:04:57,246 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2900, loss[loss=0.05442, simple_loss=0.06595, pruned_loss=0.009733, audio_tagging_loss=0.01172, over 15493.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08814, pruned_loss=0.01208, audio_tagging_loss=0.008694, over 3043728.68 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:05:04,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3386006.6666666665, ans=0.0 2023-11-28 06:05:05,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3386006.6666666665, ans=0.125 2023-11-28 06:05:27,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3386140.0, ans=0.0 2023-11-28 06:05:27,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3386140.0, ans=0.125 2023-11-28 06:05:43,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-28 06:05:51,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-28 06:05:54,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3386340.0, ans=0.125 2023-11-28 06:05:54,927 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2950, loss[loss=0.07242, simple_loss=0.1008, pruned_loss=0.01128, audio_tagging_loss=0.01075, over 15340.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08865, pruned_loss=0.01224, audio_tagging_loss=0.008708, over 3042116.11 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:06:05,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386406.6666666665, ans=0.1 2023-11-28 06:06:20,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2023-11-28 06:06:24,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-28 06:06:24,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=15.0 2023-11-28 06:06:32,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3386540.0, ans=0.125 2023-11-28 06:06:33,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3386540.0, ans=0.125 2023-11-28 06:06:43,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3386606.6666666665, ans=0.04949747468305833 2023-11-28 06:06:45,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386606.6666666665, ans=0.1 2023-11-28 06:06:49,085 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-28 06:06:50,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.945e+01 9.706e+01 1.033e+02 1.300e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 06:06:55,278 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3000, loss[loss=0.07044, simple_loss=0.09628, pruned_loss=0.01325, audio_tagging_loss=0.009047, over 15287.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08865, pruned_loss=0.01222, audio_tagging_loss=0.008766, over 3048245.56 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:06:55,278 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 06:07:29,843 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.0576, simple_loss=0.05056, pruned_loss=0.005189, audio_tagging_loss=0.02713, over 4681554.00 frames. 2023-11-28 06:07:29,844 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 06:07:34,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3386673.3333333335, ans=0.0 2023-11-28 06:07:54,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3386806.6666666665, ans=0.0 2023-11-28 06:07:56,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386806.6666666665, ans=0.1 2023-11-28 06:07:56,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3386806.6666666665, ans=0.125 2023-11-28 06:08:22,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3386940.0, ans=0.0 2023-11-28 06:08:24,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-28 06:08:28,263 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3050, loss[loss=0.07124, simple_loss=0.09953, pruned_loss=0.01262, audio_tagging_loss=0.008861, over 15153.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08922, pruned_loss=0.01226, audio_tagging_loss=0.008743, over 3047169.46 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:08:42,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.64 vs. limit=10.0 2023-11-28 06:08:46,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3387073.3333333335, ans=0.125 2023-11-28 06:08:57,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-28 06:09:05,128 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:09:22,212 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-28 06:09:23,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.950e+01 9.666e+01 1.022e+02 1.393e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 06:09:26,342 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3100, loss[loss=0.09525, simple_loss=0.1305, pruned_loss=0.02408, audio_tagging_loss=0.00593, over 14438.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08969, pruned_loss=0.01241, audio_tagging_loss=0.008792, over 3051903.25 frames. ], batch size: 52, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:09:45,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3387406.6666666665, ans=0.125 2023-11-28 06:09:58,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3387473.3333333335, ans=0.125 2023-11-28 06:10:16,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387606.6666666665, ans=0.1 2023-11-28 06:10:18,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387606.6666666665, ans=0.1 2023-11-28 06:10:20,459 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-28 06:10:21,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3387606.6666666665, ans=0.0 2023-11-28 06:10:23,659 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3150, loss[loss=0.07728, simple_loss=0.1072, pruned_loss=0.01641, audio_tagging_loss=0.00728, over 15846.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09043, pruned_loss=0.01256, audio_tagging_loss=0.008925, over 3053990.55 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:10:39,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3387740.0, ans=0.05 2023-11-28 06:11:01,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:11:01,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:11:11,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3387940.0, ans=0.2 2023-11-28 06:11:15,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2023-11-28 06:11:17,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-28 06:11:18,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.906e+01 9.448e+01 1.016e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:11:20,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3388006.6666666665, ans=0.125 2023-11-28 06:11:22,234 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3200, loss[loss=0.07156, simple_loss=0.0968, pruned_loss=0.01505, audio_tagging_loss=0.008115, over 14305.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09048, pruned_loss=0.01243, audio_tagging_loss=0.008979, over 3055480.28 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:11:31,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3388006.6666666665, ans=0.0 2023-11-28 06:11:52,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3388140.0, ans=0.5 2023-11-28 06:12:06,621 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:12:15,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-28 06:12:18,443 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3250, loss[loss=0.05194, simple_loss=0.06908, pruned_loss=0.00751, audio_tagging_loss=0.00989, over 14772.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08949, pruned_loss=0.0122, audio_tagging_loss=0.009084, over 3053029.94 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:12:26,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3388340.0, ans=0.0 2023-11-28 06:12:28,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3388340.0, ans=0.2 2023-11-28 06:12:42,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388473.3333333335, ans=0.1 2023-11-28 06:12:46,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-28 06:12:55,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-28 06:13:09,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3388606.6666666665, ans=0.125 2023-11-28 06:13:13,072 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-28 06:13:15,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.835e+01 9.450e+01 1.028e+02 1.248e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:13:16,203 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3300, loss[loss=0.06083, simple_loss=0.07308, pruned_loss=0.0119, audio_tagging_loss=0.01239, over 15543.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08947, pruned_loss=0.0123, audio_tagging_loss=0.009107, over 3051718.58 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:13:23,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3388673.3333333335, ans=0.125 2023-11-28 06:13:32,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=15.0 2023-11-28 06:13:33,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3388740.0, ans=0.025 2023-11-28 06:13:58,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-28 06:14:03,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3388940.0, ans=0.125 2023-11-28 06:14:05,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3388940.0, ans=0.2 2023-11-28 06:14:09,948 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-28 06:14:13,195 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3350, loss[loss=0.08348, simple_loss=0.1117, pruned_loss=0.01727, audio_tagging_loss=0.01038, over 13970.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08909, pruned_loss=0.01212, audio_tagging_loss=0.009096, over 3050824.68 frames. ], batch size: 50, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:14:16,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3389006.6666666665, ans=0.125 2023-11-28 06:14:32,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2023-11-28 06:15:08,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-28 06:15:10,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.815e+01 9.381e+01 1.020e+02 1.211e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 06:15:11,906 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3400, loss[loss=0.04973, simple_loss=0.06616, pruned_loss=0.008474, audio_tagging_loss=0.008178, over 16049.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08976, pruned_loss=0.01225, audio_tagging_loss=0.008925, over 3047323.90 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:15:19,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3389340.0, ans=0.125 2023-11-28 06:15:24,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3389406.6666666665, ans=0.2 2023-11-28 06:15:27,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-28 06:16:01,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3389606.6666666665, ans=0.2 2023-11-28 06:16:06,779 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-28 06:16:09,955 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3450, loss[loss=0.08145, simple_loss=0.1021, pruned_loss=0.01913, audio_tagging_loss=0.01126, over 15469.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09045, pruned_loss=0.01237, audio_tagging_loss=0.008829, over 3051600.66 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:16:50,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3389873.3333333335, ans=0.125 2023-11-28 06:16:52,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3389873.3333333335, ans=0.0 2023-11-28 06:16:59,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3389940.0, ans=0.0 2023-11-28 06:17:03,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-28 06:17:06,976 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.735e+01 9.510e+01 1.030e+02 1.229e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 06:17:07,001 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3500, loss[loss=0.05926, simple_loss=0.08227, pruned_loss=0.009774, audio_tagging_loss=0.008354, over 14666.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09042, pruned_loss=0.01242, audio_tagging_loss=0.008756, over 3048411.82 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:17:09,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3390006.6666666665, ans=0.0 2023-11-28 06:17:28,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3390073.3333333335, ans=0.0 2023-11-28 06:17:32,550 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:17:39,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3390140.0, ans=0.0 2023-11-28 06:17:41,199 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:17:42,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3390206.6666666665, ans=15.0 2023-11-28 06:17:49,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2023-11-28 06:18:01,668 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-28 06:18:04,914 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3550, loss[loss=0.0875, simple_loss=0.124, pruned_loss=0.01833, audio_tagging_loss=0.007155, over 16067.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09098, pruned_loss=0.01269, audio_tagging_loss=0.008711, over 3048946.12 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:18:31,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3390473.3333333335, ans=0.2 2023-11-28 06:18:33,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390473.3333333335, ans=0.1 2023-11-28 06:18:33,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3390473.3333333335, ans=0.125 2023-11-28 06:18:52,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-28 06:19:00,560 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-28 06:19:04,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.807e+01 9.210e+01 1.006e+02 1.301e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 06:19:04,075 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3600, loss[loss=0.06703, simple_loss=0.09065, pruned_loss=0.01517, audio_tagging_loss=0.006538, over 15834.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09063, pruned_loss=0.01257, audio_tagging_loss=0.008663, over 3043272.87 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:19:06,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3390673.3333333335, ans=0.125 2023-11-28 06:19:11,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3390673.3333333335, ans=0.0 2023-11-28 06:19:32,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-28 06:19:33,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3390806.6666666665, ans=0.2 2023-11-28 06:19:44,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3390873.3333333335, ans=0.125 2023-11-28 06:19:57,682 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-28 06:20:00,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3391006.6666666665, ans=0.0 2023-11-28 06:20:00,976 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3650, loss[loss=0.06977, simple_loss=0.09804, pruned_loss=0.01301, audio_tagging_loss=0.007743, over 15607.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09135, pruned_loss=0.01272, audio_tagging_loss=0.008603, over 3047287.94 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:20:03,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3391006.6666666665, ans=0.09899494936611666 2023-11-28 06:20:11,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3391073.3333333335, ans=0.1 2023-11-28 06:20:18,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3391073.3333333335, ans=0.0 2023-11-28 06:20:19,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-28 06:20:23,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3391140.0, ans=0.125 2023-11-28 06:20:27,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3391140.0, ans=0.035 2023-11-28 06:20:30,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3391140.0, ans=0.2 2023-11-28 06:20:31,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3391140.0, ans=0.125 2023-11-28 06:20:47,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3391273.3333333335, ans=0.125 2023-11-28 06:20:54,986 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-28 06:20:58,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.782e+01 9.556e+01 1.009e+02 1.270e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 06:20:58,194 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3700, loss[loss=0.06256, simple_loss=0.08812, pruned_loss=0.01125, audio_tagging_loss=0.007247, over 13719.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09204, pruned_loss=0.01282, audio_tagging_loss=0.008498, over 3053651.31 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:21:02,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3391340.0, ans=0.125 2023-11-28 06:21:16,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3391406.6666666665, ans=0.2 2023-11-28 06:21:24,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-28 06:21:34,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3391540.0, ans=0.07 2023-11-28 06:21:53,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-28 06:21:56,792 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3750, loss[loss=0.05152, simple_loss=0.07456, pruned_loss=0.009023, audio_tagging_loss=0.005211, over 16190.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09164, pruned_loss=0.01288, audio_tagging_loss=0.008571, over 3052231.14 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:22:19,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-28 06:22:24,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-28 06:22:25,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-28 06:22:40,379 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:22:50,267 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-28 06:22:51,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3391940.0, ans=0.0 2023-11-28 06:22:53,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3392006.6666666665, ans=0.125 2023-11-28 06:22:53,856 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3800, loss[loss=0.04714, simple_loss=0.05445, pruned_loss=0.008146, audio_tagging_loss=0.01177, over 15859.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09107, pruned_loss=0.01275, audio_tagging_loss=0.008746, over 3051882.25 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:22:54,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.959e+01 9.739e+01 1.027e+02 1.673e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 06:22:56,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3392006.6666666665, ans=10.0 2023-11-28 06:23:06,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3392073.3333333335, ans=0.1 2023-11-28 06:23:19,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-28 06:23:40,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3392273.3333333335, ans=0.125 2023-11-28 06:23:48,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=8.0 2023-11-28 06:23:48,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-28 06:23:51,650 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3850, loss[loss=0.06873, simple_loss=0.09114, pruned_loss=0.01439, audio_tagging_loss=0.008773, over 15593.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08991, pruned_loss=0.01251, audio_tagging_loss=0.008824, over 3047942.10 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:23:56,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3392340.0, ans=0.125 2023-11-28 06:23:58,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-28 06:24:08,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3392406.6666666665, ans=0.125 2023-11-28 06:24:13,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3392406.6666666665, ans=0.0 2023-11-28 06:24:16,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3392473.3333333335, ans=0.1 2023-11-28 06:24:18,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=22.5 2023-11-28 06:24:38,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-28 06:24:39,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3392606.6666666665, ans=0.0 2023-11-28 06:24:46,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-28 06:24:50,031 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3900, loss[loss=0.06532, simple_loss=0.08287, pruned_loss=0.01321, audio_tagging_loss=0.01068, over 15515.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09006, pruned_loss=0.0126, audio_tagging_loss=0.008807, over 3040360.03 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:24:51,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.618e+01 9.454e+01 1.005e+02 2.661e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-28 06:24:55,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3392673.3333333335, ans=0.0 2023-11-28 06:25:20,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-28 06:25:31,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3392873.3333333335, ans=0.0 2023-11-28 06:25:44,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-28 06:25:48,003 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3950, loss[loss=0.0468, simple_loss=0.06157, pruned_loss=0.0065, audio_tagging_loss=0.00952, over 14052.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09038, pruned_loss=0.01264, audio_tagging_loss=0.008847, over 3038438.34 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:26:07,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393073.3333333335, ans=0.1 2023-11-28 06:26:17,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3393140.0, ans=0.0 2023-11-28 06:26:19,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3393140.0, ans=0.125 2023-11-28 06:26:29,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3393206.6666666665, ans=0.2 2023-11-28 06:26:42,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-28 06:26:46,281 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4000, loss[loss=0.06, simple_loss=0.08476, pruned_loss=0.01059, audio_tagging_loss=0.007027, over 15062.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09048, pruned_loss=0.01255, audio_tagging_loss=0.008869, over 3039506.37 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:26:46,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3393340.0, ans=0.07 2023-11-28 06:26:47,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.879e+01 9.661e+01 1.044e+02 1.748e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 06:26:51,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3393340.0, ans=0.125 2023-11-28 06:27:03,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3393406.6666666665, ans=0.0 2023-11-28 06:27:07,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3393406.6666666665, ans=0.2 2023-11-28 06:27:07,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3393406.6666666665, ans=0.125 2023-11-28 06:27:10,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=22.5 2023-11-28 06:27:12,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3393473.3333333335, ans=10.0 2023-11-28 06:27:24,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3393540.0, ans=0.125 2023-11-28 06:27:39,971 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-28 06:27:44,183 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4050, loss[loss=0.08536, simple_loss=0.1229, pruned_loss=0.01523, audio_tagging_loss=0.008702, over 15483.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09105, pruned_loss=0.01271, audio_tagging_loss=0.008904, over 3045324.86 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:27:50,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-28 06:27:50,670 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:28:01,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3393740.0, ans=22.5 2023-11-28 06:28:04,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3393740.0, ans=0.0 2023-11-28 06:28:18,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3393873.3333333335, ans=0.125 2023-11-28 06:28:20,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3393873.3333333335, ans=12.0 2023-11-28 06:28:37,560 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-28 06:28:40,828 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4100, loss[loss=0.0805, simple_loss=0.1135, pruned_loss=0.01569, audio_tagging_loss=0.008087, over 16215.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09068, pruned_loss=0.01247, audio_tagging_loss=0.008852, over 3048518.94 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:28:42,159 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:28:43,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.020e+01 9.493e+01 1.014e+02 1.731e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 06:28:48,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3394006.6666666665, ans=0.125 2023-11-28 06:28:52,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3394073.3333333335, ans=0.1 2023-11-28 06:28:59,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3394073.3333333335, ans=0.2 2023-11-28 06:29:01,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3394073.3333333335, ans=0.2 2023-11-28 06:29:20,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3394206.6666666665, ans=0.0 2023-11-28 06:29:35,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-28 06:29:38,567 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4150, loss[loss=0.05986, simple_loss=0.07196, pruned_loss=0.01434, audio_tagging_loss=0.009549, over 15027.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09077, pruned_loss=0.01245, audio_tagging_loss=0.008733, over 3051446.08 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:29:47,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3394340.0, ans=15.0 2023-11-28 06:30:00,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.58 vs. limit=10.0 2023-11-28 06:30:21,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394540.0, ans=0.1 2023-11-28 06:30:25,684 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:30:33,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-28 06:30:36,811 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4200, loss[loss=0.06689, simple_loss=0.09145, pruned_loss=0.01448, audio_tagging_loss=0.006684, over 14936.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09008, pruned_loss=0.01228, audio_tagging_loss=0.008703, over 3051613.86 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:30:39,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.720e+01 9.332e+01 1.017e+02 1.296e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 06:30:49,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3394740.0, ans=0.0 2023-11-28 06:30:49,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=8.0 2023-11-28 06:31:12,593 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:31:14,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3394873.3333333335, ans=0.2 2023-11-28 06:31:32,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-28 06:31:35,363 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4250, loss[loss=0.06382, simple_loss=0.0876, pruned_loss=0.01153, audio_tagging_loss=0.008492, over 14814.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09148, pruned_loss=0.01249, audio_tagging_loss=0.008529, over 3052238.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:31:43,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395006.6666666665, ans=0.1 2023-11-28 06:31:49,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3395073.3333333335, ans=0.2 2023-11-28 06:31:50,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-28 06:31:56,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3395073.3333333335, ans=0.2 2023-11-28 06:31:59,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-28 06:32:16,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-11-28 06:32:29,168 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-28 06:32:33,200 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4300, loss[loss=0.05672, simple_loss=0.07834, pruned_loss=0.008167, audio_tagging_loss=0.009385, over 15351.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09192, pruned_loss=0.01246, audio_tagging_loss=0.008448, over 3051506.73 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:32:34,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3395340.0, ans=0.0 2023-11-28 06:32:35,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.011e+01 8.843e+01 9.507e+01 1.023e+02 2.128e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 06:32:37,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3395340.0, ans=0.125 2023-11-28 06:32:39,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3395340.0, ans=0.025 2023-11-28 06:32:46,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395406.6666666665, ans=0.1 2023-11-28 06:32:46,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-28 06:32:47,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-28 06:32:54,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-28 06:33:22,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=22.5 2023-11-28 06:33:24,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3395606.6666666665, ans=0.2 2023-11-28 06:33:27,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-28 06:33:31,223 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4350, loss[loss=0.07015, simple_loss=0.09807, pruned_loss=0.0117, audio_tagging_loss=0.00942, over 15069.00 frames. ], tot_loss[loss=0.067, simple_loss=0.092, pruned_loss=0.01252, audio_tagging_loss=0.008475, over 3049306.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:33:31,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-11-28 06:33:40,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3395673.3333333335, ans=0.0 2023-11-28 06:34:06,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2023-11-28 06:34:09,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2023-11-28 06:34:10,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3395873.3333333335, ans=0.125 2023-11-28 06:34:11,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3395873.3333333335, ans=0.2 2023-11-28 06:34:23,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2023-11-28 06:34:24,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3395940.0, ans=0.125 2023-11-28 06:34:26,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-28 06:34:29,643 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4400, loss[loss=0.06447, simple_loss=0.09094, pruned_loss=0.01056, audio_tagging_loss=0.008441, over 15430.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.0926, pruned_loss=0.01267, audio_tagging_loss=0.008482, over 3047822.15 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:34:31,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.958e+01 9.354e+01 1.037e+02 1.325e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 06:35:08,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-28 06:35:12,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-28 06:35:16,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3396273.3333333335, ans=0.125 2023-11-28 06:35:23,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-28 06:35:23,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3396273.3333333335, ans=0.0 2023-11-28 06:35:26,823 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4450, loss[loss=0.06672, simple_loss=0.09798, pruned_loss=0.01194, audio_tagging_loss=0.005788, over 14728.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09319, pruned_loss=0.01267, audio_tagging_loss=0.008386, over 3053543.74 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:35:28,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-11-28 06:35:33,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396340.0, ans=0.125 2023-11-28 06:35:43,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396406.6666666665, ans=0.1 2023-11-28 06:35:49,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3396473.3333333335, ans=0.125 2023-11-28 06:35:52,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3396473.3333333335, ans=0.2 2023-11-28 06:36:14,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3396606.6666666665, ans=0.125 2023-11-28 06:36:21,605 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-28 06:36:24,869 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4500, loss[loss=0.07461, simple_loss=0.1054, pruned_loss=0.0156, audio_tagging_loss=0.006289, over 15235.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09243, pruned_loss=0.0126, audio_tagging_loss=0.008403, over 3059102.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:36:27,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.759e+01 9.220e+01 9.806e+01 1.292e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 06:36:29,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3396673.3333333335, ans=0.125 2023-11-28 06:36:31,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3396673.3333333335, ans=0.125 2023-11-28 06:36:32,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396673.3333333335, ans=0.125 2023-11-28 06:36:51,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3396806.6666666665, ans=0.015 2023-11-28 06:37:19,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-28 06:37:23,167 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4550, loss[loss=0.06293, simple_loss=0.08352, pruned_loss=0.01312, audio_tagging_loss=0.008046, over 14575.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09156, pruned_loss=0.01258, audio_tagging_loss=0.008489, over 3057155.67 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:37:23,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2023-11-28 06:37:25,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3397006.6666666665, ans=0.0 2023-11-28 06:37:37,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-11-28 06:38:00,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-11-28 06:38:03,804 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:38:07,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3397206.6666666665, ans=0.95 2023-11-28 06:38:11,270 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:38:15,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3397273.3333333335, ans=0.1 2023-11-28 06:38:16,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-28 06:38:18,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3397273.3333333335, ans=0.125 2023-11-28 06:38:18,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397273.3333333335, ans=0.1 2023-11-28 06:38:20,303 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4600, loss[loss=0.05478, simple_loss=0.07783, pruned_loss=0.008479, audio_tagging_loss=0.007381, over 14233.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09011, pruned_loss=0.01223, audio_tagging_loss=0.008589, over 3058333.85 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:38:22,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.724e+01 9.423e+01 1.019e+02 1.398e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:38:33,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-28 06:38:35,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2023-11-28 06:39:00,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3397540.0, ans=0.125 2023-11-28 06:39:03,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3397540.0, ans=0.125 2023-11-28 06:39:10,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-28 06:39:14,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-28 06:39:16,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3397606.6666666665, ans=0.09899494936611666 2023-11-28 06:39:17,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3397673.3333333335, ans=0.125 2023-11-28 06:39:18,108 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4650, loss[loss=0.07703, simple_loss=0.1085, pruned_loss=0.01454, audio_tagging_loss=0.008225, over 15669.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08973, pruned_loss=0.01231, audio_tagging_loss=0.00867, over 3060671.38 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:39:22,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3397673.3333333335, ans=0.125 2023-11-28 06:39:23,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3397673.3333333335, ans=0.0 2023-11-28 06:39:39,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3397740.0, ans=0.125 2023-11-28 06:40:07,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3397940.0, ans=0.125 2023-11-28 06:40:13,658 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-28 06:40:16,810 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4700, loss[loss=0.0752, simple_loss=0.1058, pruned_loss=0.01476, audio_tagging_loss=0.007526, over 15214.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09034, pruned_loss=0.01243, audio_tagging_loss=0.008746, over 3063123.80 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:40:17,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3398006.6666666665, ans=0.125 2023-11-28 06:40:18,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.857e+01 9.480e+01 1.024e+02 1.425e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 06:41:10,392 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-28 06:41:13,604 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4750, loss[loss=0.05586, simple_loss=0.06964, pruned_loss=0.01187, audio_tagging_loss=0.00917, over 14479.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08968, pruned_loss=0.01243, audio_tagging_loss=0.00882, over 3064429.76 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:41:14,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2023-11-28 06:41:22,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-28 06:41:37,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3398473.3333333335, ans=0.2 2023-11-28 06:42:04,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=22.5 2023-11-28 06:42:06,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3398606.6666666665, ans=0.04949747468305833 2023-11-28 06:42:07,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-28 06:42:11,445 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4800, loss[loss=0.08053, simple_loss=0.11, pruned_loss=0.0175, audio_tagging_loss=0.008035, over 15077.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.089, pruned_loss=0.01233, audio_tagging_loss=0.008912, over 3061918.31 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:42:13,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.791e+01 9.387e+01 1.001e+02 1.346e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 06:42:13,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3398673.3333333335, ans=0.125 2023-11-28 06:42:17,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3398673.3333333335, ans=0.125 2023-11-28 06:42:34,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3398806.6666666665, ans=0.1 2023-11-28 06:42:44,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398806.6666666665, ans=0.1 2023-11-28 06:43:03,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3398940.0, ans=0.0 2023-11-28 06:43:05,726 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-28 06:43:09,514 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4850, loss[loss=0.07279, simple_loss=0.1018, pruned_loss=0.01169, audio_tagging_loss=0.01019, over 15079.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08854, pruned_loss=0.01227, audio_tagging_loss=0.009067, over 3049670.58 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:43:18,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3399006.6666666665, ans=0.1 2023-11-28 06:43:50,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3399206.6666666665, ans=0.0 2023-11-28 06:44:02,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-28 06:44:05,794 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4900, loss[loss=0.07185, simple_loss=0.1052, pruned_loss=0.0134, audio_tagging_loss=0.005824, over 14233.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08944, pruned_loss=0.01237, audio_tagging_loss=0.008963, over 3049758.34 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:44:07,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.789e+01 9.268e+01 1.027e+02 1.406e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 06:44:10,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3399340.0, ans=0.125 2023-11-28 06:44:21,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3399406.6666666665, ans=0.07 2023-11-28 06:44:25,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3399406.6666666665, ans=0.125 2023-11-28 06:44:27,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-28 06:44:32,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3399473.3333333335, ans=0.1 2023-11-28 06:44:35,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2023-11-28 06:44:49,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3399540.0, ans=0.0 2023-11-28 06:44:57,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3399606.6666666665, ans=0.0 2023-11-28 06:44:59,550 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-28 06:45:03,470 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4950, loss[loss=0.06964, simple_loss=0.08944, pruned_loss=0.01647, audio_tagging_loss=0.008453, over 14899.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08988, pruned_loss=0.01241, audio_tagging_loss=0.008827, over 3046973.37 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:45:12,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3399673.3333333335, ans=0.0 2023-11-28 06:45:37,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3399873.3333333335, ans=0.1 2023-11-28 06:45:49,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-28 06:45:53,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-28 06:45:57,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-28 06:46:01,573 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5000, loss[loss=0.05587, simple_loss=0.07225, pruned_loss=0.01214, audio_tagging_loss=0.007605, over 14581.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09015, pruned_loss=0.01267, audio_tagging_loss=0.008788, over 3038196.52 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:46:03,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3400006.6666666665, ans=0.2 2023-11-28 06:46:05,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.682e+01 9.362e+01 1.003e+02 1.327e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 06:46:09,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3400006.6666666665, ans=0.05 2023-11-28 06:46:22,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3400073.3333333335, ans=0.0 2023-11-28 06:46:26,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2023-11-28 06:46:30,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-28 06:46:35,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3400206.6666666665, ans=0.2 2023-11-28 06:46:37,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3400206.6666666665, ans=0.0 2023-11-28 06:46:44,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2023-11-28 06:46:54,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.04 vs. limit=15.0 2023-11-28 06:46:55,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3400273.3333333335, ans=0.0 2023-11-28 06:46:55,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3400273.3333333335, ans=0.2 2023-11-28 06:46:56,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-28 06:46:59,731 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5050, loss[loss=0.06554, simple_loss=0.09051, pruned_loss=0.01357, audio_tagging_loss=0.006712, over 14989.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08977, pruned_loss=0.01248, audio_tagging_loss=0.008731, over 3039776.90 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:47:22,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3400473.3333333335, ans=0.2 2023-11-28 06:47:53,300 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-28 06:47:56,491 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5100, loss[loss=0.07943, simple_loss=0.1098, pruned_loss=0.01317, audio_tagging_loss=0.01135, over 14933.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08939, pruned_loss=0.01245, audio_tagging_loss=0.008699, over 3043927.00 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:48:00,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.708e+01 9.390e+01 1.019e+02 1.146e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 06:48:07,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3400740.0, ans=0.125 2023-11-28 06:48:07,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2023-11-28 06:48:14,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:48:18,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-28 06:48:22,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400806.6666666665, ans=0.1 2023-11-28 06:48:33,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3400873.3333333335, ans=0.1 2023-11-28 06:48:50,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-28 06:48:54,195 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5150, loss[loss=0.08207, simple_loss=0.1209, pruned_loss=0.01502, audio_tagging_loss=0.006606, over 14508.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08925, pruned_loss=0.0123, audio_tagging_loss=0.008738, over 3042668.35 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:49:02,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-11-28 06:49:16,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3401140.0, ans=0.1 2023-11-28 06:49:18,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3401140.0, ans=0.125 2023-11-28 06:49:21,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-11-28 06:49:38,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3401206.6666666665, ans=0.2 2023-11-28 06:49:44,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3401273.3333333335, ans=0.0 2023-11-28 06:49:49,080 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-28 06:49:50,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3401273.3333333335, ans=0.0 2023-11-28 06:49:52,688 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5200, loss[loss=0.05239, simple_loss=0.06847, pruned_loss=0.01148, audio_tagging_loss=0.006678, over 14909.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08962, pruned_loss=0.01237, audio_tagging_loss=0.008679, over 3041945.69 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:49:56,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.684e+01 9.283e+01 1.002e+02 1.274e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 06:50:01,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3401340.0, ans=0.0 2023-11-28 06:50:08,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3401406.6666666665, ans=10.0 2023-11-28 06:50:11,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3401406.6666666665, ans=0.0 2023-11-28 06:50:11,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2023-11-28 06:50:21,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3401473.3333333335, ans=0.125 2023-11-28 06:50:47,136 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-28 06:50:50,401 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5250, loss[loss=0.07, simple_loss=0.1064, pruned_loss=0.01072, audio_tagging_loss=0.006098, over 14450.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09008, pruned_loss=0.01262, audio_tagging_loss=0.0086, over 3039962.82 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:05,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3401740.0, ans=0.2 2023-11-28 06:51:27,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3401873.3333333335, ans=0.0 2023-11-28 06:51:27,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-11-28 06:51:37,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3401940.0, ans=0.125 2023-11-28 06:51:44,511 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-28 06:51:46,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3402006.6666666665, ans=0.0 2023-11-28 06:51:47,662 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5300, loss[loss=0.04847, simple_loss=0.05759, pruned_loss=0.008827, audio_tagging_loss=0.01085, over 13950.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0906, pruned_loss=0.01262, audio_tagging_loss=0.008665, over 3040415.77 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:48,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3402006.6666666665, ans=0.125 2023-11-28 06:51:50,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.883e+01 9.472e+01 1.016e+02 1.198e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:51:56,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3402006.6666666665, ans=0.0 2023-11-28 06:52:04,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3402073.3333333335, ans=0.125 2023-11-28 06:52:11,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3402140.0, ans=0.0 2023-11-28 06:52:12,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402140.0, ans=0.1 2023-11-28 06:52:12,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3402140.0, ans=0.07 2023-11-28 06:52:42,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-28 06:52:45,616 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5350, loss[loss=0.05635, simple_loss=0.08023, pruned_loss=0.01003, audio_tagging_loss=0.006205, over 14804.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09073, pruned_loss=0.01264, audio_tagging_loss=0.008612, over 3038019.85 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:52:58,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3402406.6666666665, ans=0.04949747468305833 2023-11-28 06:53:21,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3402540.0, ans=0.0 2023-11-28 06:53:21,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-28 06:53:24,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2023-11-28 06:53:38,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3402606.6666666665, ans=0.2 2023-11-28 06:53:39,048 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-28 06:53:43,124 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5400, loss[loss=0.06127, simple_loss=0.08759, pruned_loss=0.01133, audio_tagging_loss=0.006151, over 14189.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09078, pruned_loss=0.01252, audio_tagging_loss=0.008625, over 3041869.45 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:53:47,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.616e+01 9.187e+01 1.017e+02 1.243e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-28 06:53:55,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3402740.0, ans=0.125 2023-11-28 06:54:13,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=15.0 2023-11-28 06:54:14,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402806.6666666665, ans=0.1 2023-11-28 06:54:16,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3402873.3333333335, ans=0.125 2023-11-28 06:54:21,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3402873.3333333335, ans=0.0 2023-11-28 06:54:34,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-28 06:54:37,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-28 06:54:40,381 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5450, loss[loss=0.03966, simple_loss=0.04595, pruned_loss=0.005281, audio_tagging_loss=0.01141, over 14410.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09098, pruned_loss=0.01245, audio_tagging_loss=0.008565, over 3041639.90 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:54:51,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3403073.3333333335, ans=0.125 2023-11-28 06:54:51,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403073.3333333335, ans=0.1 2023-11-28 06:55:06,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2023-11-28 06:55:25,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3403273.3333333335, ans=0.95 2023-11-28 06:55:34,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-28 06:55:38,008 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5500, loss[loss=0.0656, simple_loss=0.09172, pruned_loss=0.0117, audio_tagging_loss=0.008042, over 15460.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08992, pruned_loss=0.0124, audio_tagging_loss=0.008763, over 3034770.29 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:55:42,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.928e+01 9.472e+01 1.024e+02 1.464e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:55:45,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3403340.0, ans=0.125 2023-11-28 06:55:51,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-28 06:55:59,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3403473.3333333335, ans=0.125 2023-11-28 06:56:06,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=12.0 2023-11-28 06:56:15,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403540.0, ans=0.1 2023-11-28 06:56:16,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403540.0, ans=0.1 2023-11-28 06:56:18,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3403540.0, ans=0.125 2023-11-28 06:56:19,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3403540.0, ans=0.2 2023-11-28 06:56:31,785 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-28 06:56:34,972 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5550, loss[loss=0.05474, simple_loss=0.06683, pruned_loss=0.009373, audio_tagging_loss=0.01195, over 15825.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09037, pruned_loss=0.01248, audio_tagging_loss=0.008826, over 3034399.37 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:56:36,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-28 06:56:43,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-28 06:56:53,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2023-11-28 06:57:06,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3403806.6666666665, ans=0.0 2023-11-28 06:57:13,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3403873.3333333335, ans=0.0 2023-11-28 06:57:19,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3403940.0, ans=0.07 2023-11-28 06:57:19,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-28 06:57:22,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2023-11-28 06:57:29,865 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-28 06:57:31,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3403940.0, ans=0.2 2023-11-28 06:57:32,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3404006.6666666665, ans=0.035 2023-11-28 06:57:32,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3404006.6666666665, ans=0.0 2023-11-28 06:57:32,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3404006.6666666665, ans=0.07 2023-11-28 06:57:33,360 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5600, loss[loss=0.07111, simple_loss=0.09407, pruned_loss=0.01517, audio_tagging_loss=0.008903, over 15456.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09005, pruned_loss=0.01252, audio_tagging_loss=0.008991, over 3040472.99 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:57:34,724 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:57:35,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3404006.6666666665, ans=0.0 2023-11-28 06:57:36,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2023-11-28 06:57:37,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-28 06:57:37,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 9.053e+01 9.702e+01 1.068e+02 1.336e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 06:57:53,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-28 06:58:09,366 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:58:11,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2023-11-28 06:58:17,460 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:58:27,112 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-28 06:58:30,285 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5650, loss[loss=0.06256, simple_loss=0.07449, pruned_loss=0.01188, audio_tagging_loss=0.01344, over 15202.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09054, pruned_loss=0.01269, audio_tagging_loss=0.008991, over 3047822.82 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:58:31,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404340.0, ans=0.1 2023-11-28 06:58:35,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3404340.0, ans=0.125 2023-11-28 06:58:48,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-28 06:58:49,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3404406.6666666665, ans=0.2 2023-11-28 06:58:52,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3404473.3333333335, ans=0.125 2023-11-28 06:59:03,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3404540.0, ans=0.1 2023-11-28 06:59:05,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3404540.0, ans=0.0 2023-11-28 06:59:07,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3404540.0, ans=0.0 2023-11-28 06:59:16,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3404606.6666666665, ans=0.125 2023-11-28 06:59:23,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3404606.6666666665, ans=0.0 2023-11-28 06:59:24,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-28 06:59:27,442 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5700, loss[loss=0.07447, simple_loss=0.1065, pruned_loss=0.01327, audio_tagging_loss=0.007954, over 16112.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08961, pruned_loss=0.01233, audio_tagging_loss=0.009056, over 3052756.46 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:59:32,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3404673.3333333335, ans=0.0 2023-11-28 06:59:32,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.734e+01 9.325e+01 1.007e+02 1.153e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 06:59:33,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3404673.3333333335, ans=0.0 2023-11-28 06:59:59,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2023-11-28 07:00:05,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3404873.3333333335, ans=0.0 2023-11-28 07:00:20,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3404940.0, ans=0.125 2023-11-28 07:00:21,515 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-28 07:00:24,725 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5750, loss[loss=0.06863, simple_loss=0.1009, pruned_loss=0.01164, audio_tagging_loss=0.006534, over 15676.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08911, pruned_loss=0.01223, audio_tagging_loss=0.008906, over 3048282.76 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:00:35,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3405073.3333333335, ans=0.1 2023-11-28 07:00:53,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-28 07:00:57,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3405140.0, ans=0.125 2023-11-28 07:01:18,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-28 07:01:22,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-28 07:01:22,718 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5800, loss[loss=0.06938, simple_loss=0.09842, pruned_loss=0.01316, audio_tagging_loss=0.007008, over 14671.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08949, pruned_loss=0.01227, audio_tagging_loss=0.008755, over 3047948.38 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:01:28,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.777e+01 9.348e+01 1.032e+02 1.624e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 07:01:43,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3405473.3333333335, ans=0.05 2023-11-28 07:01:43,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3405473.3333333335, ans=0.0 2023-11-28 07:02:12,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-28 07:02:16,402 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-28 07:02:17,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3405606.6666666665, ans=0.125 2023-11-28 07:02:19,693 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5850, loss[loss=0.06843, simple_loss=0.08488, pruned_loss=0.01458, audio_tagging_loss=0.01141, over 14703.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08966, pruned_loss=0.01222, audio_tagging_loss=0.008599, over 3043626.96 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:02:27,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-28 07:02:31,853 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:02:33,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3405740.0, ans=0.0 2023-11-28 07:02:36,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-28 07:02:36,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3405740.0, ans=0.2 2023-11-28 07:02:58,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3405873.3333333335, ans=0.1 2023-11-28 07:03:03,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3405873.3333333335, ans=0.04949747468305833 2023-11-28 07:03:04,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2023-11-28 07:03:13,171 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-28 07:03:16,958 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5900, loss[loss=0.07956, simple_loss=0.1052, pruned_loss=0.01771, audio_tagging_loss=0.00924, over 15716.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08952, pruned_loss=0.01227, audio_tagging_loss=0.008599, over 3048050.46 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:03:22,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.808e+01 9.419e+01 9.961e+01 1.259e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:03:24,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3406006.6666666665, ans=0.125 2023-11-28 07:03:44,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3406140.0, ans=0.0 2023-11-28 07:03:49,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3406140.0, ans=15.0 2023-11-28 07:03:53,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3406206.6666666665, ans=0.0 2023-11-28 07:04:04,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3406273.3333333335, ans=0.0 2023-11-28 07:04:11,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-28 07:04:14,894 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5950, loss[loss=0.066, simple_loss=0.0908, pruned_loss=0.01221, audio_tagging_loss=0.008394, over 15473.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08941, pruned_loss=0.01218, audio_tagging_loss=0.008563, over 3051001.03 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:04:18,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3406340.0, ans=0.1 2023-11-28 07:04:38,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3406473.3333333335, ans=0.0 2023-11-28 07:04:40,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2023-11-28 07:05:02,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-28 07:05:06,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3406606.6666666665, ans=0.125 2023-11-28 07:05:09,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-28 07:05:12,613 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6000, loss[loss=0.06245, simple_loss=0.0815, pruned_loss=0.01177, audio_tagging_loss=0.009935, over 14988.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08953, pruned_loss=0.01228, audio_tagging_loss=0.008557, over 3052406.34 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:05:12,614 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 07:05:29,393 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5911, 3.9687, 3.5120, 3.6246], device='cuda:3') 2023-11-28 07:05:41,363 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4186, 2.9755, 3.2319, 2.9426, 3.6564, 3.7787, 3.2244, 3.2479], device='cuda:3') 2023-11-28 07:05:47,612 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.0577, simple_loss=0.05058, pruned_loss=0.005244, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 07:05:47,613 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 07:05:49,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3406673.3333333335, ans=0.0 2023-11-28 07:05:53,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.750e+01 9.275e+01 1.001e+02 1.273e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:05:53,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3406673.3333333335, ans=0.125 2023-11-28 07:06:04,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406740.0, ans=0.1 2023-11-28 07:06:14,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3406806.6666666665, ans=0.125 2023-11-28 07:06:24,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3406873.3333333335, ans=0.1 2023-11-28 07:06:32,446 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:06:37,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3406940.0, ans=0.125 2023-11-28 07:06:41,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-28 07:06:42,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3406940.0, ans=0.125 2023-11-28 07:06:44,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3407006.6666666665, ans=0.0 2023-11-28 07:06:45,682 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6050, loss[loss=0.06339, simple_loss=0.08705, pruned_loss=0.01064, audio_tagging_loss=0.009229, over 14460.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08964, pruned_loss=0.01243, audio_tagging_loss=0.008598, over 3050346.45 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:07:25,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3407206.6666666665, ans=0.0 2023-11-28 07:07:31,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-28 07:07:38,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3407273.3333333335, ans=0.1 2023-11-28 07:07:39,119 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-28 07:07:42,399 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6100, loss[loss=0.07961, simple_loss=0.1125, pruned_loss=0.01499, audio_tagging_loss=0.008367, over 15882.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09038, pruned_loss=0.01247, audio_tagging_loss=0.008517, over 3059768.22 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:07:47,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.837e+01 9.364e+01 1.005e+02 1.238e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 07:08:22,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3407540.0, ans=0.2 2023-11-28 07:08:36,536 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-28 07:08:39,903 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6150, loss[loss=0.07734, simple_loss=0.1107, pruned_loss=0.0141, audio_tagging_loss=0.007896, over 15558.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08961, pruned_loss=0.01253, audio_tagging_loss=0.008544, over 3054525.66 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:08:49,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3407673.3333333335, ans=0.1 2023-11-28 07:08:55,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3407740.0, ans=0.0 2023-11-28 07:09:02,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3407806.6666666665, ans=0.0 2023-11-28 07:09:05,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3407806.6666666665, ans=0.125 2023-11-28 07:09:10,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3407806.6666666665, ans=0.125 2023-11-28 07:09:32,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3407940.0, ans=0.125 2023-11-28 07:09:32,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3407940.0, ans=0.0 2023-11-28 07:09:33,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-28 07:09:37,604 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6200, loss[loss=0.05683, simple_loss=0.07685, pruned_loss=0.008015, audio_tagging_loss=0.01039, over 17400.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.0889, pruned_loss=0.0124, audio_tagging_loss=0.008768, over 3053645.74 frames. ], batch size: 66, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:09:43,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.472e+01 8.632e+01 9.387e+01 1.018e+02 1.235e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:09:59,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.01 vs. limit=10.0 2023-11-28 07:10:11,923 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:10:23,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3408273.3333333335, ans=0.1 2023-11-28 07:10:31,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-28 07:10:34,779 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6250, loss[loss=0.06971, simple_loss=0.1001, pruned_loss=0.0123, audio_tagging_loss=0.007361, over 14406.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08905, pruned_loss=0.01246, audio_tagging_loss=0.008848, over 3048187.79 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:10:36,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3408340.0, ans=0.2 2023-11-28 07:10:41,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3408340.0, ans=0.125 2023-11-28 07:10:46,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3408406.6666666665, ans=0.0 2023-11-28 07:10:51,192 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:10:55,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=12.0 2023-11-28 07:10:58,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3408473.3333333335, ans=0.0 2023-11-28 07:11:05,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=12.0 2023-11-28 07:11:21,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3408606.6666666665, ans=0.125 2023-11-28 07:11:28,915 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-28 07:11:30,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3408606.6666666665, ans=0.125 2023-11-28 07:11:32,052 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6300, loss[loss=0.06196, simple_loss=0.08228, pruned_loss=0.01171, audio_tagging_loss=0.009112, over 14388.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08875, pruned_loss=0.01237, audio_tagging_loss=0.008998, over 3056551.82 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:11:38,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.880e+01 9.504e+01 1.024e+02 1.327e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 07:12:06,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3408873.3333333335, ans=0.125 2023-11-28 07:12:23,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.14 vs. limit=10.0 2023-11-28 07:12:26,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-28 07:12:28,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3409006.6666666665, ans=0.0 2023-11-28 07:12:29,737 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6350, loss[loss=0.07203, simple_loss=0.1006, pruned_loss=0.01457, audio_tagging_loss=0.007166, over 15754.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0889, pruned_loss=0.01231, audio_tagging_loss=0.009012, over 3047722.23 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:12:35,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3409006.6666666665, ans=0.0 2023-11-28 07:12:39,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-28 07:12:45,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3409073.3333333335, ans=0.125 2023-11-28 07:12:45,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3409073.3333333335, ans=0.125 2023-11-28 07:13:16,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3409273.3333333335, ans=15.0 2023-11-28 07:13:24,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-28 07:13:24,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3409273.3333333335, ans=0.125 2023-11-28 07:13:28,616 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6400, loss[loss=0.06845, simple_loss=0.09962, pruned_loss=0.01099, audio_tagging_loss=0.007642, over 15850.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08826, pruned_loss=0.01229, audio_tagging_loss=0.00909, over 3045239.98 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:13:35,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.831e+01 9.327e+01 9.903e+01 1.480e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:13:51,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3409473.3333333335, ans=0.125 2023-11-28 07:13:59,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3409473.3333333335, ans=0.0 2023-11-28 07:14:21,595 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-28 07:14:22,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3409606.6666666665, ans=0.0 2023-11-28 07:14:24,817 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6450, loss[loss=0.05024, simple_loss=0.07237, pruned_loss=0.005405, audio_tagging_loss=0.008648, over 16353.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08818, pruned_loss=0.01227, audio_tagging_loss=0.009176, over 3034073.01 frames. ], batch size: 66, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:14:24,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3409673.3333333335, ans=0.125 2023-11-28 07:14:52,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3409806.6666666665, ans=0.1 2023-11-28 07:14:53,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-11-28 07:14:55,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.81 vs. limit=22.5 2023-11-28 07:15:18,605 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-28 07:15:19,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-11-28 07:15:21,741 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6500, loss[loss=0.08332, simple_loss=0.1144, pruned_loss=0.01896, audio_tagging_loss=0.007144, over 16094.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08835, pruned_loss=0.01223, audio_tagging_loss=0.009131, over 3041591.19 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:15:28,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.791e+01 9.611e+01 1.014e+02 1.471e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 07:15:29,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.13 vs. limit=10.0 2023-11-28 07:15:45,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3410140.0, ans=0.125 2023-11-28 07:15:52,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3410140.0, ans=0.2 2023-11-28 07:16:00,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3410206.6666666665, ans=0.125 2023-11-28 07:16:04,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3410206.6666666665, ans=0.1 2023-11-28 07:16:15,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-28 07:16:19,179 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6550, loss[loss=0.05829, simple_loss=0.07896, pruned_loss=0.01103, audio_tagging_loss=0.007775, over 15105.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08912, pruned_loss=0.01223, audio_tagging_loss=0.008979, over 3041885.37 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:16:20,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3410340.0, ans=0.125 2023-11-28 07:16:24,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3410340.0, ans=0.125 2023-11-28 07:16:28,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3410340.0, ans=0.0 2023-11-28 07:16:31,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-28 07:16:35,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410406.6666666665, ans=0.1 2023-11-28 07:16:38,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410406.6666666665, ans=0.1 2023-11-28 07:16:51,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3410473.3333333335, ans=0.05 2023-11-28 07:17:01,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3410540.0, ans=0.2 2023-11-28 07:17:02,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3410540.0, ans=0.0 2023-11-28 07:17:05,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-28 07:17:05,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3410606.6666666665, ans=0.0 2023-11-28 07:17:06,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-28 07:17:06,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-28 07:17:09,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-28 07:17:10,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3410606.6666666665, ans=10.0 2023-11-28 07:17:12,737 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-28 07:17:16,224 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6600, loss[loss=0.06449, simple_loss=0.08633, pruned_loss=0.01152, audio_tagging_loss=0.009802, over 14821.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08908, pruned_loss=0.01209, audio_tagging_loss=0.008798, over 3045970.66 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:17:24,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.683e+01 9.479e+01 1.018e+02 1.462e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 07:17:28,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3410740.0, ans=0.125 2023-11-28 07:17:33,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3410740.0, ans=0.1 2023-11-28 07:17:38,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3410806.6666666665, ans=0.0 2023-11-28 07:17:44,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3410806.6666666665, ans=0.125 2023-11-28 07:17:49,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2023-11-28 07:17:56,402 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:17:58,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3410873.3333333335, ans=0.0 2023-11-28 07:18:01,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2023-11-28 07:18:02,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3410940.0, ans=0.025 2023-11-28 07:18:09,940 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-28 07:18:13,102 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6650, loss[loss=0.05914, simple_loss=0.07469, pruned_loss=0.01318, audio_tagging_loss=0.008622, over 14903.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08937, pruned_loss=0.01215, audio_tagging_loss=0.008715, over 3045133.50 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:18:14,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3411006.6666666665, ans=0.0 2023-11-28 07:18:15,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3411006.6666666665, ans=0.025 2023-11-28 07:18:15,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-28 07:18:30,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3411073.3333333335, ans=0.1 2023-11-28 07:18:38,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3411140.0, ans=0.125 2023-11-28 07:18:49,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3411206.6666666665, ans=22.5 2023-11-28 07:19:01,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3411273.3333333335, ans=0.0 2023-11-28 07:19:07,112 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-28 07:19:10,394 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6700, loss[loss=0.0473, simple_loss=0.05706, pruned_loss=0.01015, audio_tagging_loss=0.008624, over 15202.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08922, pruned_loss=0.01214, audio_tagging_loss=0.008563, over 3046886.22 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:19:11,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3411340.0, ans=0.2 2023-11-28 07:19:17,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 9.075e+01 9.531e+01 1.012e+02 1.694e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 07:19:19,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3411340.0, ans=0.0 2023-11-28 07:19:22,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3411406.6666666665, ans=0.0 2023-11-28 07:19:39,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3411473.3333333335, ans=0.125 2023-11-28 07:19:44,776 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:20:00,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411606.6666666665, ans=0.0 2023-11-28 07:20:03,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-28 07:20:07,116 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6750, loss[loss=0.06415, simple_loss=0.09272, pruned_loss=0.00921, audio_tagging_loss=0.008582, over 16073.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08885, pruned_loss=0.012, audio_tagging_loss=0.008672, over 3041407.63 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:20:07,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3411673.3333333335, ans=0.125 2023-11-28 07:20:12,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3411673.3333333335, ans=0.125 2023-11-28 07:20:15,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-28 07:20:30,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3411806.6666666665, ans=0.0 2023-11-28 07:20:43,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2023-11-28 07:20:48,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3411873.3333333335, ans=0.0 2023-11-28 07:20:55,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2023-11-28 07:20:56,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3411940.0, ans=0.125 2023-11-28 07:21:00,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-28 07:21:00,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3411940.0, ans=0.1 2023-11-28 07:21:04,734 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6800, loss[loss=0.0512, simple_loss=0.0616, pruned_loss=0.009606, audio_tagging_loss=0.01079, over 15663.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08737, pruned_loss=0.01181, audio_tagging_loss=0.008678, over 3033798.50 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:21:07,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3412006.6666666665, ans=0.0 2023-11-28 07:21:12,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.904e+01 9.309e+01 9.890e+01 1.281e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 07:21:16,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3412073.3333333335, ans=0.125 2023-11-28 07:21:18,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2023-11-28 07:21:32,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-28 07:21:33,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3412140.0, ans=0.0 2023-11-28 07:21:34,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3412140.0, ans=0.125 2023-11-28 07:21:39,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.71 vs. limit=10.0 2023-11-28 07:21:40,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3412206.6666666665, ans=0.125 2023-11-28 07:21:44,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-28 07:21:59,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-28 07:22:02,564 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6850, loss[loss=0.06584, simple_loss=0.08488, pruned_loss=0.01286, audio_tagging_loss=0.01054, over 15446.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08717, pruned_loss=0.01182, audio_tagging_loss=0.008713, over 3039166.98 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:22:16,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3412406.6666666665, ans=0.0 2023-11-28 07:22:17,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3412406.6666666665, ans=0.0 2023-11-28 07:22:56,258 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-28 07:22:59,472 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6900, loss[loss=0.05611, simple_loss=0.07969, pruned_loss=0.007814, audio_tagging_loss=0.008453, over 15386.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08782, pruned_loss=0.01185, audio_tagging_loss=0.008673, over 3034044.42 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:23:07,193 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.771e+01 9.385e+01 1.023e+02 1.493e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:23:07,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3412673.3333333335, ans=0.1 2023-11-28 07:23:48,836 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:23:52,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3412940.0, ans=0.125 2023-11-28 07:23:53,280 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-28 07:23:57,047 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6950, loss[loss=0.08043, simple_loss=0.1113, pruned_loss=0.01762, audio_tagging_loss=0.007129, over 14753.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08906, pruned_loss=0.01197, audio_tagging_loss=0.008546, over 3037124.28 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:24:02,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-28 07:24:05,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3413006.6666666665, ans=0.125 2023-11-28 07:24:05,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3413006.6666666665, ans=0.05 2023-11-28 07:24:17,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3413073.3333333335, ans=0.125 2023-11-28 07:24:20,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-28 07:24:35,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3413206.6666666665, ans=0.025 2023-11-28 07:24:51,237 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-28 07:24:57,469 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7000, loss[loss=0.06358, simple_loss=0.07936, pruned_loss=0.01489, audio_tagging_loss=0.009002, over 14432.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08892, pruned_loss=0.01204, audio_tagging_loss=0.008578, over 3039893.66 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:25:06,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.579e+01 9.421e+01 1.029e+02 1.258e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:25:14,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413406.6666666665, ans=0.1 2023-11-28 07:25:21,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3413473.3333333335, ans=0.0 2023-11-28 07:25:48,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3413606.6666666665, ans=0.1 2023-11-28 07:25:50,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-28 07:25:53,836 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7050, loss[loss=0.0858, simple_loss=0.1192, pruned_loss=0.01901, audio_tagging_loss=0.007163, over 15588.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08845, pruned_loss=0.01193, audio_tagging_loss=0.008765, over 3037811.36 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:00,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3413673.3333333335, ans=0.125 2023-11-28 07:26:09,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2023-11-28 07:26:10,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3413740.0, ans=0.1 2023-11-28 07:26:14,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=22.5 2023-11-28 07:26:34,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-28 07:26:43,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3413940.0, ans=0.125 2023-11-28 07:26:46,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-28 07:26:50,147 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7100, loss[loss=0.0751, simple_loss=0.1018, pruned_loss=0.01627, audio_tagging_loss=0.007904, over 15882.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08902, pruned_loss=0.01203, audio_tagging_loss=0.008803, over 3047169.22 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:57,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3414006.6666666665, ans=0.125 2023-11-28 07:26:59,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3414006.6666666665, ans=0.0 2023-11-28 07:27:01,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 9.094e+01 9.538e+01 1.011e+02 1.389e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 07:27:19,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3414140.0, ans=0.07 2023-11-28 07:27:44,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-28 07:27:48,105 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7150, loss[loss=0.07439, simple_loss=0.09805, pruned_loss=0.01672, audio_tagging_loss=0.008638, over 14772.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08948, pruned_loss=0.01207, audio_tagging_loss=0.008835, over 3044479.51 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 4.0 2023-11-28 07:28:41,959 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-28 07:28:45,577 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7200, loss[loss=0.06774, simple_loss=0.08836, pruned_loss=0.00962, audio_tagging_loss=0.01393, over 15287.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09036, pruned_loss=0.01223, audio_tagging_loss=0.008815, over 3044186.46 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:28:46,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3414673.3333333335, ans=0.05 2023-11-28 07:28:52,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3414673.3333333335, ans=0.2 2023-11-28 07:28:55,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-11-28 07:28:56,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 8.861e+01 9.668e+01 1.042e+02 2.032e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-28 07:29:04,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3414740.0, ans=0.0 2023-11-28 07:29:15,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3414806.6666666665, ans=0.0 2023-11-28 07:29:25,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2023-11-28 07:29:35,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3414940.0, ans=0.125 2023-11-28 07:29:38,890 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-28 07:29:42,137 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7250, loss[loss=0.05803, simple_loss=0.08053, pruned_loss=0.009641, audio_tagging_loss=0.008124, over 15122.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09092, pruned_loss=0.01226, audio_tagging_loss=0.008777, over 3042651.69 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:29:55,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-28 07:29:56,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3415073.3333333335, ans=0.125 2023-11-28 07:30:06,950 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:30:08,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3415140.0, ans=0.07 2023-11-28 07:30:16,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3415206.6666666665, ans=0.125 2023-11-28 07:30:21,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3415206.6666666665, ans=0.125 2023-11-28 07:30:25,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3415206.6666666665, ans=0.0 2023-11-28 07:30:27,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-28 07:30:34,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-28 07:30:35,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-28 07:30:36,111 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-28 07:30:39,336 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7300, loss[loss=0.05264, simple_loss=0.07292, pruned_loss=0.006846, audio_tagging_loss=0.009329, over 16338.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09036, pruned_loss=0.0122, audio_tagging_loss=0.008771, over 3044735.02 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:30:48,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2023-11-28 07:30:49,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3415340.0, ans=0.125 2023-11-28 07:30:51,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.861e+01 9.411e+01 1.033e+02 1.259e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:31:14,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3415540.0, ans=0.125 2023-11-28 07:31:19,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=22.5 2023-11-28 07:31:22,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3415540.0, ans=0.125 2023-11-28 07:31:33,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-28 07:31:37,001 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7350, loss[loss=0.06245, simple_loss=0.09358, pruned_loss=0.00835, audio_tagging_loss=0.007311, over 15272.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09054, pruned_loss=0.01222, audio_tagging_loss=0.00868, over 3041235.86 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:31:38,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3415673.3333333335, ans=0.0 2023-11-28 07:31:40,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3415673.3333333335, ans=0.0 2023-11-28 07:31:49,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3415740.0, ans=0.0 2023-11-28 07:31:53,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-28 07:32:18,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.97 vs. limit=6.0 2023-11-28 07:32:29,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3415940.0, ans=0.025 2023-11-28 07:32:29,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-28 07:32:31,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2023-11-28 07:32:33,359 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7400, loss[loss=0.0638, simple_loss=0.08316, pruned_loss=0.01329, audio_tagging_loss=0.008935, over 14657.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08998, pruned_loss=0.0121, audio_tagging_loss=0.008569, over 3039412.48 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:32:41,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.98 vs. limit=15.0 2023-11-28 07:32:44,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3416073.3333333335, ans=0.125 2023-11-28 07:32:44,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.822e+01 9.327e+01 1.016e+02 1.231e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:32:50,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-28 07:33:09,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-28 07:33:11,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.33 vs. limit=5.0 2023-11-28 07:33:27,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-28 07:33:30,873 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7450, loss[loss=0.07199, simple_loss=0.1056, pruned_loss=0.01417, audio_tagging_loss=0.004996, over 16201.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09063, pruned_loss=0.0123, audio_tagging_loss=0.008418, over 3039478.56 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:33:31,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3416340.0, ans=0.125 2023-11-28 07:33:57,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3416473.3333333335, ans=0.125 2023-11-28 07:34:25,998 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-28 07:34:29,281 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7500, loss[loss=0.07782, simple_loss=0.1196, pruned_loss=0.01271, audio_tagging_loss=0.005329, over 15411.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08998, pruned_loss=0.01217, audio_tagging_loss=0.008413, over 3040272.63 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:34:38,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3416673.3333333335, ans=0.0 2023-11-28 07:34:40,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.775e+01 9.275e+01 9.988e+01 1.436e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:34:51,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3416806.6666666665, ans=0.0 2023-11-28 07:35:06,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-28 07:35:09,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=22.5 2023-11-28 07:35:10,781 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:35:22,815 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-28 07:35:26,331 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7550, loss[loss=0.04887, simple_loss=0.06061, pruned_loss=0.00708, audio_tagging_loss=0.01149, over 15371.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08993, pruned_loss=0.01223, audio_tagging_loss=0.008444, over 3043863.92 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:36:05,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-28 07:36:07,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-28 07:36:07,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-28 07:36:10,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3417206.6666666665, ans=0.125 2023-11-28 07:36:12,102 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:36:20,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-28 07:36:25,103 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7600, loss[loss=0.05841, simple_loss=0.07822, pruned_loss=0.007991, audio_tagging_loss=0.01131, over 14097.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08857, pruned_loss=0.01203, audio_tagging_loss=0.008525, over 3042086.12 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:36:25,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.70 vs. limit=10.0 2023-11-28 07:36:31,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=22.5 2023-11-28 07:36:36,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.736e+01 9.227e+01 9.964e+01 1.331e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 07:36:42,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3417406.6666666665, ans=0.125 2023-11-28 07:37:02,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3417540.0, ans=0.125 2023-11-28 07:37:20,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-28 07:37:22,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3417673.3333333335, ans=15.0 2023-11-28 07:37:23,495 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7650, loss[loss=0.07103, simple_loss=0.08977, pruned_loss=0.01798, audio_tagging_loss=0.008165, over 14833.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08968, pruned_loss=0.01237, audio_tagging_loss=0.008475, over 3044320.39 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:37:39,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3417740.0, ans=0.125 2023-11-28 07:38:03,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3417873.3333333335, ans=0.0 2023-11-28 07:38:05,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3417873.3333333335, ans=0.2 2023-11-28 07:38:06,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3417873.3333333335, ans=0.125 2023-11-28 07:38:09,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3417940.0, ans=0.1 2023-11-28 07:38:12,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-11-28 07:38:18,869 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-28 07:38:20,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3417940.0, ans=0.0 2023-11-28 07:38:22,123 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7700, loss[loss=0.09426, simple_loss=0.1289, pruned_loss=0.02213, audio_tagging_loss=0.007699, over 15454.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09046, pruned_loss=0.01254, audio_tagging_loss=0.008526, over 3045666.97 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:38:22,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3418006.6666666665, ans=0.125 2023-11-28 07:38:24,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3418006.6666666665, ans=0.125 2023-11-28 07:38:34,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.901e+01 9.400e+01 1.006e+02 1.251e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 07:38:44,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3418140.0, ans=0.0 2023-11-28 07:38:44,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3418140.0, ans=0.125 2023-11-28 07:38:47,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3418140.0, ans=0.07 2023-11-28 07:38:51,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-28 07:39:04,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3418206.6666666665, ans=0.125 2023-11-28 07:39:22,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3418273.3333333335, ans=0.0 2023-11-28 07:39:31,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-28 07:39:51,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-28 07:40:09,399 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7750, loss[loss=0.05265, simple_loss=0.06959, pruned_loss=0.006598, audio_tagging_loss=0.01125, over 16582.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09052, pruned_loss=0.01247, audio_tagging_loss=0.008486, over 3041812.33 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:41:22,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2023-11-28 07:41:26,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-28 07:41:45,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-28 07:42:52,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-11-28 07:43:32,213 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-28 07:43:32,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3418606.6666666665, ans=0.09899494936611666 2023-11-28 07:43:46,008 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7800, loss[loss=0.04943, simple_loss=0.0666, pruned_loss=0.004472, audio_tagging_loss=0.01166, over 14032.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09087, pruned_loss=0.01263, audio_tagging_loss=0.008581, over 3039152.73 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:44:31,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.859e+01 9.420e+01 1.032e+02 1.560e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:45:04,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3418806.6666666665, ans=0.2 2023-11-28 07:45:35,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2023-11-28 07:46:21,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-28 07:46:37,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3418940.0, ans=0.2 2023-11-28 07:46:48,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-28 07:47:01,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3419006.6666666665, ans=0.05 2023-11-28 07:47:05,876 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7850, loss[loss=0.0693, simple_loss=0.09885, pruned_loss=0.01281, audio_tagging_loss=0.007072, over 15174.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09068, pruned_loss=0.01257, audio_tagging_loss=0.00862, over 3038172.10 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:47:40,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-28 07:48:17,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3419073.3333333335, ans=0.125 2023-11-28 07:48:27,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3419073.3333333335, ans=0.0 2023-11-28 07:50:32,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-28 07:50:44,764 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7900, loss[loss=0.06916, simple_loss=0.1106, pruned_loss=0.007226, audio_tagging_loss=0.006642, over 14983.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09033, pruned_loss=0.01251, audio_tagging_loss=0.00876, over 3037488.64 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:50:54,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3419340.0, ans=0.125 2023-11-28 07:51:25,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 8.861e+01 9.655e+01 1.039e+02 1.530e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 07:51:37,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3419406.6666666665, ans=0.125 2023-11-28 07:52:33,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3419540.0, ans=0.0 2023-11-28 07:53:38,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3419606.6666666665, ans=0.0 2023-11-28 07:53:42,634 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-28 07:53:42,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3419606.6666666665, ans=0.0 2023-11-28 07:53:45,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3419606.6666666665, ans=0.0 2023-11-28 07:53:52,956 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7950, loss[loss=0.07636, simple_loss=0.1045, pruned_loss=0.01541, audio_tagging_loss=0.008689, over 17169.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.0902, pruned_loss=0.0126, audio_tagging_loss=0.008857, over 3044658.77 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:54:58,296 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:56:02,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3419873.3333333335, ans=0.125 2023-11-28 07:56:19,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2023-11-28 07:57:24,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-28 07:57:41,037 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8000, loss[loss=0.05788, simple_loss=0.07121, pruned_loss=0.0112, audio_tagging_loss=0.01108, over 14351.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08872, pruned_loss=0.01241, audio_tagging_loss=0.009, over 3044826.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:58:27,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.711e+01 9.409e+01 1.028e+02 1.220e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:58:54,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3420073.3333333335, ans=0.1 2023-11-28 07:59:04,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420140.0, ans=0.1 2023-11-28 07:59:10,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-28 08:01:07,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-28 08:01:19,596 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8050, loss[loss=0.05763, simple_loss=0.07687, pruned_loss=0.009988, audio_tagging_loss=0.00921, over 14142.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08885, pruned_loss=0.01231, audio_tagging_loss=0.009066, over 3047217.96 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:01:23,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3420340.0, ans=0.0 2023-11-28 08:02:59,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3420540.0, ans=0.125 2023-11-28 08:04:01,851 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-28 08:04:11,733 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8100, loss[loss=0.06799, simple_loss=0.07916, pruned_loss=0.01244, audio_tagging_loss=0.01596, over 13930.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08903, pruned_loss=0.01239, audio_tagging_loss=0.009007, over 3047672.07 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:04:51,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.964e+01 9.574e+01 1.024e+02 1.325e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 08:05:56,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3420806.6666666665, ans=0.0 2023-11-28 08:06:08,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3420873.3333333335, ans=0.125 2023-11-28 08:06:50,833 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:06:51,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2023-11-28 08:07:02,426 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-28 08:07:09,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2023-11-28 08:07:11,811 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8150, loss[loss=0.06712, simple_loss=0.0923, pruned_loss=0.01072, audio_tagging_loss=0.01024, over 14522.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09056, pruned_loss=0.01267, audio_tagging_loss=0.008708, over 3051507.34 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:07:17,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3421006.6666666665, ans=0.0 2023-11-28 08:07:30,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3421006.6666666665, ans=0.035 2023-11-28 08:07:31,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-28 08:07:49,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3421073.3333333335, ans=0.125 2023-11-28 08:08:40,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3421140.0, ans=0.125 2023-11-28 08:08:48,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3421140.0, ans=0.0 2023-11-28 08:10:00,142 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-28 08:10:09,139 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8200, loss[loss=0.06357, simple_loss=0.09029, pruned_loss=0.01247, audio_tagging_loss=0.005949, over 15344.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09129, pruned_loss=0.01277, audio_tagging_loss=0.008523, over 3049665.84 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:10:20,907 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:10:32,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3421340.0, ans=0.0 2023-11-28 08:10:48,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.671e+01 9.315e+01 1.033e+02 1.596e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 08:10:48,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3421406.6666666665, ans=0.125 2023-11-28 08:10:53,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3421406.6666666665, ans=0.125 2023-11-28 08:12:23,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-28 08:12:26,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3421606.6666666665, ans=0.125 2023-11-28 08:12:52,945 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-28 08:13:04,390 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8250, loss[loss=0.06833, simple_loss=0.09026, pruned_loss=0.01316, audio_tagging_loss=0.01003, over 13970.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08974, pruned_loss=0.01263, audio_tagging_loss=0.008638, over 3046651.71 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:13:05,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-28 08:13:11,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3421673.3333333335, ans=0.0 2023-11-28 08:13:13,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3421673.3333333335, ans=0.2 2023-11-28 08:13:23,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3421673.3333333335, ans=0.125 2023-11-28 08:13:51,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3421740.0, ans=0.0 2023-11-28 08:14:38,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2023-11-28 08:15:06,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-11-28 08:15:23,301 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:15:54,319 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-28 08:16:08,434 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8300, loss[loss=0.06743, simple_loss=0.08794, pruned_loss=0.01316, audio_tagging_loss=0.0103, over 15126.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08997, pruned_loss=0.01263, audio_tagging_loss=0.008776, over 3045209.82 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:16:49,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.832e+01 9.492e+01 1.019e+02 1.242e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 08:17:42,671 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:18:29,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3422206.6666666665, ans=0.2 2023-11-28 08:18:33,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3422206.6666666665, ans=0.1 2023-11-28 08:18:39,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3422273.3333333335, ans=0.125 2023-11-28 08:19:10,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-28 08:19:20,926 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8350, loss[loss=0.05111, simple_loss=0.07715, pruned_loss=0.007002, audio_tagging_loss=0.005529, over 13878.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09014, pruned_loss=0.01265, audio_tagging_loss=0.008682, over 3039476.51 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:19:21,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3422340.0, ans=0.0 2023-11-28 08:20:26,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2023-11-28 08:20:29,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3422473.3333333335, ans=0.125 2023-11-28 08:20:49,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3422473.3333333335, ans=0.07 2023-11-28 08:20:58,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3422540.0, ans=15.0 2023-11-28 08:21:56,030 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-28 08:22:05,649 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8400, loss[loss=0.07273, simple_loss=0.1046, pruned_loss=0.01387, audio_tagging_loss=0.006581, over 15782.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08951, pruned_loss=0.01261, audio_tagging_loss=0.008631, over 3034287.55 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:22:35,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.870e+01 9.331e+01 1.011e+02 1.281e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 08:22:45,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3422740.0, ans=0.0 2023-11-28 08:22:45,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3422740.0, ans=0.0 2023-11-28 08:23:03,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422806.6666666665, ans=0.1 2023-11-28 08:23:07,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=12.0 2023-11-28 08:23:27,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3422873.3333333335, ans=0.125 2023-11-28 08:24:17,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=22.5 2023-11-28 08:24:19,151 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-28 08:24:26,970 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8450, loss[loss=0.06802, simple_loss=0.09441, pruned_loss=0.01207, audio_tagging_loss=0.008746, over 14983.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08987, pruned_loss=0.01255, audio_tagging_loss=0.008661, over 3043145.26 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:25:08,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3423073.3333333335, ans=0.2 2023-11-28 08:25:13,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3423140.0, ans=0.1 2023-11-28 08:25:44,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3423206.6666666665, ans=0.2 2023-11-28 08:26:29,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-28 08:26:32,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3423273.3333333335, ans=0.125 2023-11-28 08:26:35,511 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8500, loss[loss=0.05808, simple_loss=0.08084, pruned_loss=0.009792, audio_tagging_loss=0.00787, over 16722.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08984, pruned_loss=0.01262, audio_tagging_loss=0.008682, over 3050300.24 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:27:06,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.890e+01 9.437e+01 1.019e+02 2.913e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-28 08:27:37,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3423473.3333333335, ans=0.2 2023-11-28 08:28:36,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-28 08:28:44,640 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8550, loss[loss=0.07126, simple_loss=0.1004, pruned_loss=0.01373, audio_tagging_loss=0.00734, over 14548.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.0903, pruned_loss=0.01245, audio_tagging_loss=0.0087, over 3051576.11 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:28:55,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3423673.3333333335, ans=0.2 2023-11-28 08:29:13,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3423740.0, ans=0.125 2023-11-28 08:29:50,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3423806.6666666665, ans=0.0 2023-11-28 08:30:21,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-28 08:30:26,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-28 08:30:30,596 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-28 08:30:37,853 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8600, loss[loss=0.07248, simple_loss=0.1028, pruned_loss=0.01271, audio_tagging_loss=0.008372, over 14634.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09126, pruned_loss=0.01254, audio_tagging_loss=0.008669, over 3052535.77 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:30:57,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.892e+01 9.588e+01 1.028e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 08:31:20,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3424140.0, ans=0.0 2023-11-28 08:31:35,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2023-11-28 08:32:02,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3424273.3333333335, ans=0.125 2023-11-28 08:32:04,896 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:32:09,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-28 08:32:14,171 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8650, loss[loss=0.06074, simple_loss=0.07613, pruned_loss=0.01238, audio_tagging_loss=0.01029, over 14576.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09202, pruned_loss=0.01267, audio_tagging_loss=0.008659, over 3054556.48 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:32:30,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-28 08:33:21,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3424540.0, ans=0.0 2023-11-28 08:33:27,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-28 08:33:45,384 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-28 08:33:50,731 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8700, loss[loss=0.0654, simple_loss=0.09139, pruned_loss=0.009313, audio_tagging_loss=0.01039, over 14946.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09172, pruned_loss=0.01259, audio_tagging_loss=0.008747, over 3059869.05 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:34:04,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3424673.3333333335, ans=0.125 2023-11-28 08:34:13,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.836e+01 9.429e+01 1.013e+02 1.223e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 08:34:16,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-28 08:34:27,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3424806.6666666665, ans=0.125 2023-11-28 08:34:32,021 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:35:15,320 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-28 08:35:20,359 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8750, loss[loss=0.06572, simple_loss=0.08179, pruned_loss=0.01423, audio_tagging_loss=0.01059, over 16426.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09218, pruned_loss=0.01265, audio_tagging_loss=0.008824, over 3054198.08 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:35:25,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3425006.6666666665, ans=0.07 2023-11-28 08:35:34,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3425006.6666666665, ans=12.0 2023-11-28 08:35:39,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3425073.3333333335, ans=0.2 2023-11-28 08:35:50,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2023-11-28 08:35:56,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3425073.3333333335, ans=0.125 2023-11-28 08:36:29,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3425206.6666666665, ans=0.0 2023-11-28 08:36:44,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-28 08:36:45,777 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:36:48,696 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-28 08:36:54,342 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8800, loss[loss=0.06046, simple_loss=0.09213, pruned_loss=0.008539, audio_tagging_loss=0.005854, over 14817.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09194, pruned_loss=0.01253, audio_tagging_loss=0.008947, over 3051759.28 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:37:07,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3425340.0, ans=0.125 2023-11-28 08:37:10,627 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:37:13,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.831e+01 9.235e+01 9.998e+01 1.254e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 08:37:27,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425473.3333333335, ans=0.1 2023-11-28 08:37:40,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3425473.3333333335, ans=0.125 2023-11-28 08:37:44,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2023-11-28 08:38:07,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.49 vs. limit=22.5 2023-11-28 08:38:10,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-28 08:38:15,007 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8850, loss[loss=0.07772, simple_loss=0.09621, pruned_loss=0.01635, audio_tagging_loss=0.01327, over 15138.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09155, pruned_loss=0.01263, audio_tagging_loss=0.008917, over 3054339.35 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:38:22,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3425673.3333333335, ans=0.125 2023-11-28 08:38:28,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3425673.3333333335, ans=0.125 2023-11-28 08:38:30,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3425740.0, ans=0.05 2023-11-28 08:38:36,548 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:38:44,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3425740.0, ans=0.125 2023-11-28 08:39:00,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3425806.6666666665, ans=0.125 2023-11-28 08:39:21,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3425940.0, ans=0.05 2023-11-28 08:39:22,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3425940.0, ans=0.125 2023-11-28 08:39:31,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-28 08:39:36,045 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8900, loss[loss=0.04691, simple_loss=0.05843, pruned_loss=0.007586, audio_tagging_loss=0.01011, over 15080.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09149, pruned_loss=0.01267, audio_tagging_loss=0.008815, over 3061109.12 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:39:45,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3426006.6666666665, ans=0.125 2023-11-28 08:39:57,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.722e+01 9.445e+01 1.012e+02 1.187e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 08:40:09,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-28 08:40:23,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-28 08:40:46,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-28 08:40:50,563 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8950, loss[loss=0.05957, simple_loss=0.08325, pruned_loss=0.009333, audio_tagging_loss=0.008609, over 15085.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09152, pruned_loss=0.01265, audio_tagging_loss=0.008681, over 3053626.27 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:40:53,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3426340.0, ans=0.125 2023-11-28 08:41:02,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3426406.6666666665, ans=0.2 2023-11-28 08:41:09,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=12.0 2023-11-28 08:41:19,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-28 08:41:40,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426540.0, ans=0.1 2023-11-28 08:41:44,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-28 08:41:51,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-28 08:41:52,887 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-28 08:41:53,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3426606.6666666665, ans=0.0 2023-11-28 08:41:57,096 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9000, loss[loss=0.05511, simple_loss=0.07228, pruned_loss=0.009033, audio_tagging_loss=0.009937, over 13867.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09081, pruned_loss=0.01254, audio_tagging_loss=0.008684, over 3050168.68 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:41:57,097 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 08:42:21,585 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9709, 3.2024, 2.8899, 3.2529, 3.3890, 2.8057, 3.4026, 2.6443], device='cuda:3') 2023-11-28 08:42:28,689 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1163, 2.4880, 5.0015, 3.0020], device='cuda:3') 2023-11-28 08:42:30,462 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0161, 2.7645, 1.6848, 2.6894, 3.3055, 3.2352, 3.2559, 3.5667], device='cuda:3') 2023-11-28 08:42:35,445 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05867, simple_loss=0.05056, pruned_loss=0.005241, audio_tagging_loss=0.02815, over 4681554.00 frames. 2023-11-28 08:42:35,446 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 08:42:43,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3426673.3333333335, ans=0.125 2023-11-28 08:42:53,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3426740.0, ans=0.125 2023-11-28 08:42:53,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.891e+01 9.730e+01 1.046e+02 2.169e+02, threshold=1.946e+02, percent-clipped=1.0 2023-11-28 08:42:54,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3426740.0, ans=0.125 2023-11-28 08:43:12,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3426806.6666666665, ans=0.1 2023-11-28 08:43:13,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-11-28 08:43:18,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=12.0 2023-11-28 08:43:26,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3426940.0, ans=0.0 2023-11-28 08:43:30,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3426940.0, ans=0.125 2023-11-28 08:43:36,101 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-28 08:43:40,590 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9050, loss[loss=0.07306, simple_loss=0.111, pruned_loss=0.009853, audio_tagging_loss=0.007707, over 15000.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09117, pruned_loss=0.01255, audio_tagging_loss=0.008672, over 3046887.60 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:43:42,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3427006.6666666665, ans=0.0 2023-11-28 08:43:43,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3427006.6666666665, ans=0.04949747468305833 2023-11-28 08:43:48,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427006.6666666665, ans=0.1 2023-11-28 08:43:49,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=12.0 2023-11-28 08:43:54,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-28 08:44:17,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3427206.6666666665, ans=0.0 2023-11-28 08:44:31,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-28 08:44:36,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-28 08:44:39,472 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-28 08:44:43,137 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9100, loss[loss=0.06261, simple_loss=0.08696, pruned_loss=0.008868, audio_tagging_loss=0.01027, over 16126.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09149, pruned_loss=0.01261, audio_tagging_loss=0.008627, over 3058370.30 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:45:01,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 9.021e+01 9.381e+01 1.003e+02 1.228e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 08:45:03,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3427406.6666666665, ans=0.1 2023-11-28 08:45:10,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-28 08:45:11,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-28 08:45:16,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3427473.3333333335, ans=0.0 2023-11-28 08:45:21,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-28 08:45:23,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-11-28 08:45:29,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3427540.0, ans=10.0 2023-11-28 08:45:36,299 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:45:40,587 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-28 08:45:44,493 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9150, loss[loss=0.06389, simple_loss=0.09449, pruned_loss=0.007457, audio_tagging_loss=0.009187, over 16410.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09187, pruned_loss=0.01253, audio_tagging_loss=0.008534, over 3060740.82 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:45:55,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427740.0, ans=0.1 2023-11-28 08:45:55,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3427740.0, ans=0.125 2023-11-28 08:46:03,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=22.5 2023-11-28 08:46:28,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3427873.3333333335, ans=0.125 2023-11-28 08:46:30,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3427940.0, ans=0.0 2023-11-28 08:46:39,298 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-28 08:46:39,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2023-11-28 08:46:41,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.39 vs. limit=15.0 2023-11-28 08:46:42,863 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9200, loss[loss=0.06314, simple_loss=0.08329, pruned_loss=0.01184, audio_tagging_loss=0.009657, over 15888.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09206, pruned_loss=0.01257, audio_tagging_loss=0.008478, over 3057922.31 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:46:54,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3428073.3333333335, ans=0.1 2023-11-28 08:46:58,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.605e+01 9.339e+01 9.879e+01 1.258e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 08:47:22,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-28 08:47:32,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-28 08:47:37,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-28 08:47:40,305 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9250, loss[loss=0.06891, simple_loss=0.08959, pruned_loss=0.01602, audio_tagging_loss=0.008092, over 14577.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09189, pruned_loss=0.01255, audio_tagging_loss=0.008377, over 3063372.36 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:47:42,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3428340.0, ans=0.1 2023-11-28 08:48:01,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2023-11-28 08:48:21,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3428540.0, ans=0.125 2023-11-28 08:48:31,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3428606.6666666665, ans=0.2 2023-11-28 08:48:34,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-28 08:48:34,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-28 08:48:38,060 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9300, loss[loss=0.06471, simple_loss=0.09403, pruned_loss=0.01019, audio_tagging_loss=0.007511, over 13901.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09088, pruned_loss=0.01241, audio_tagging_loss=0.008469, over 3053183.99 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:48:47,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3428673.3333333335, ans=0.1 2023-11-28 08:48:54,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.614e+01 9.246e+01 9.788e+01 1.593e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-28 08:48:56,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3428740.0, ans=0.125 2023-11-28 08:49:32,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-28 08:49:35,350 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9350, loss[loss=0.06308, simple_loss=0.09057, pruned_loss=0.0103, audio_tagging_loss=0.007494, over 15989.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08972, pruned_loss=0.01215, audio_tagging_loss=0.008573, over 3046262.22 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:49:43,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3429006.6666666665, ans=0.0 2023-11-28 08:49:48,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3429073.3333333335, ans=10.0 2023-11-28 08:50:02,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3429140.0, ans=0.125 2023-11-28 08:50:05,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3429140.0, ans=0.04949747468305833 2023-11-28 08:50:11,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-11-28 08:50:26,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3429273.3333333335, ans=0.0 2023-11-28 08:50:28,896 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-28 08:50:32,405 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9400, loss[loss=0.06743, simple_loss=0.09131, pruned_loss=0.01171, audio_tagging_loss=0.01007, over 14283.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09045, pruned_loss=0.0124, audio_tagging_loss=0.008653, over 3052357.83 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:50:45,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-28 08:50:47,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3429406.6666666665, ans=0.0 2023-11-28 08:50:48,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.921e+01 9.569e+01 1.013e+02 1.910e+02, threshold=1.914e+02, percent-clipped=1.0 2023-11-28 08:50:49,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-28 08:50:54,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-28 08:50:54,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2023-11-28 08:50:56,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3429473.3333333335, ans=0.125 2023-11-28 08:51:03,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3429473.3333333335, ans=0.125 2023-11-28 08:51:04,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3429473.3333333335, ans=0.0 2023-11-28 08:51:19,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3429606.6666666665, ans=0.2 2023-11-28 08:51:22,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3429606.6666666665, ans=10.0 2023-11-28 08:51:27,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-28 08:51:30,081 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9450, loss[loss=0.0565, simple_loss=0.07893, pruned_loss=0.008222, audio_tagging_loss=0.008811, over 14112.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09028, pruned_loss=0.01245, audio_tagging_loss=0.008727, over 3049925.74 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:51:32,364 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:51:37,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3429673.3333333335, ans=0.125 2023-11-28 08:52:00,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2023-11-28 08:52:12,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3429873.3333333335, ans=0.0 2023-11-28 08:52:23,992 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-28 08:52:27,123 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9500, loss[loss=0.07963, simple_loss=0.1122, pruned_loss=0.01619, audio_tagging_loss=0.007342, over 16317.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09026, pruned_loss=0.01247, audio_tagging_loss=0.008799, over 3055840.12 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:52:39,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-28 08:52:42,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 9.109e+01 9.581e+01 1.028e+02 2.016e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-28 08:52:54,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3430140.0, ans=0.125 2023-11-28 08:53:06,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3430206.6666666665, ans=0.2 2023-11-28 08:53:18,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3430273.3333333335, ans=0.125 2023-11-28 08:53:20,690 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-28 08:53:23,772 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9550, loss[loss=0.06737, simple_loss=0.09231, pruned_loss=0.01236, audio_tagging_loss=0.008857, over 14368.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09009, pruned_loss=0.01238, audio_tagging_loss=0.008882, over 3054658.74 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:53:28,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3430340.0, ans=0.0 2023-11-28 08:53:44,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-28 08:53:51,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3430473.3333333335, ans=0.1 2023-11-28 08:53:58,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3430540.0, ans=10.0 2023-11-28 08:54:02,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3430540.0, ans=0.0 2023-11-28 08:54:11,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-28 08:54:13,920 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:54:14,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3430606.6666666665, ans=0.0 2023-11-28 08:54:17,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-28 08:54:21,545 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9600, loss[loss=0.05076, simple_loss=0.07019, pruned_loss=0.007617, audio_tagging_loss=0.008051, over 16109.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08969, pruned_loss=0.01232, audio_tagging_loss=0.008923, over 3053536.14 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:54:25,076 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:54:37,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.954e+01 8.927e+01 9.333e+01 1.014e+02 1.212e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 08:54:50,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2023-11-28 08:54:52,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2023-11-28 08:55:05,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.35 vs. limit=10.0 2023-11-28 08:55:07,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3430940.0, ans=0.1 2023-11-28 08:55:11,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3430940.0, ans=0.1 2023-11-28 08:55:15,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3430940.0, ans=0.125 2023-11-28 08:55:16,151 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-28 08:55:19,454 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9650, loss[loss=0.07072, simple_loss=0.108, pruned_loss=0.01136, audio_tagging_loss=0.005348, over 15223.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09013, pruned_loss=0.01258, audio_tagging_loss=0.008917, over 3054071.92 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:55:21,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3431006.6666666665, ans=0.0 2023-11-28 08:55:25,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3431006.6666666665, ans=0.125 2023-11-28 08:56:05,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3431273.3333333335, ans=0.125 2023-11-28 08:56:13,092 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-28 08:56:16,243 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9700, loss[loss=0.07579, simple_loss=0.1012, pruned_loss=0.01389, audio_tagging_loss=0.01132, over 15434.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09059, pruned_loss=0.01264, audio_tagging_loss=0.008775, over 3054531.16 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:56:25,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3431340.0, ans=0.125 2023-11-28 08:56:32,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.860e+01 9.541e+01 1.023e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 08:56:40,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3431473.3333333335, ans=0.125 2023-11-28 08:56:59,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2023-11-28 08:57:05,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3431606.6666666665, ans=0.07 2023-11-28 08:57:09,649 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-28 08:57:10,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3431606.6666666665, ans=0.07 2023-11-28 08:57:13,582 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9750, loss[loss=0.04217, simple_loss=0.05348, pruned_loss=0.005643, audio_tagging_loss=0.009784, over 15995.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08996, pruned_loss=0.01256, audio_tagging_loss=0.00864, over 3050393.01 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:57:15,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.90 vs. limit=22.5 2023-11-28 08:57:16,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3431673.3333333335, ans=0.125 2023-11-28 08:57:19,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2023-11-28 08:57:21,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3431673.3333333335, ans=0.125 2023-11-28 08:57:24,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3431673.3333333335, ans=0.2 2023-11-28 08:57:31,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3431740.0, ans=0.125 2023-11-28 08:57:48,014 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:58:07,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-28 08:58:11,283 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9800, loss[loss=0.05016, simple_loss=0.0679, pruned_loss=0.007601, audio_tagging_loss=0.008609, over 15633.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09036, pruned_loss=0.01266, audio_tagging_loss=0.008489, over 3050474.33 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:58:15,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3432006.6666666665, ans=0.2 2023-11-28 08:58:27,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.900e+01 9.501e+01 1.026e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 08:58:30,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3432073.3333333335, ans=0.09899494936611666 2023-11-28 08:58:32,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-28 08:59:05,106 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-28 08:59:06,139 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:59:08,271 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9850, loss[loss=0.07985, simple_loss=0.1063, pruned_loss=0.01836, audio_tagging_loss=0.008341, over 15327.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08979, pruned_loss=0.0125, audio_tagging_loss=0.008441, over 3045273.30 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:59:37,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3432473.3333333335, ans=0.04949747468305833 2023-11-28 08:59:51,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2023-11-28 08:59:51,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3432540.0, ans=0.0 2023-11-28 08:59:59,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3432606.6666666665, ans=0.025 2023-11-28 09:00:01,453 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-28 09:00:04,608 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9900, loss[loss=0.05836, simple_loss=0.07465, pruned_loss=0.0126, audio_tagging_loss=0.008429, over 14562.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08997, pruned_loss=0.01251, audio_tagging_loss=0.008454, over 3045848.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 09:00:23,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.866e+01 9.531e+01 1.026e+02 1.362e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 09:00:43,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2023-11-28 09:00:48,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-28 09:00:49,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-28 09:00:59,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-28 09:01:03,287 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9950, loss[loss=0.04743, simple_loss=0.06051, pruned_loss=0.01066, audio_tagging_loss=0.006516, over 14003.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09011, pruned_loss=0.01247, audio_tagging_loss=0.008452, over 3045243.24 frames. ], batch size: 53, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:01:36,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3433206.6666666665, ans=0.125 2023-11-28 09:01:36,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3433206.6666666665, ans=0.125 2023-11-28 09:01:38,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3433206.6666666665, ans=0.0 2023-11-28 09:01:57,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-28 09:02:00,847 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10000, loss[loss=0.04782, simple_loss=0.06713, pruned_loss=0.005632, audio_tagging_loss=0.008628, over 14819.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0899, pruned_loss=0.01238, audio_tagging_loss=0.008486, over 3045246.35 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:02:08,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3433340.0, ans=0.125 2023-11-28 09:02:18,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.838e+01 9.507e+01 1.055e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 09:02:47,882 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:02:48,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3433606.6666666665, ans=0.0 2023-11-28 09:02:54,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-28 09:02:56,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3433673.3333333335, ans=0.125 2023-11-28 09:02:57,634 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10050, loss[loss=0.0474, simple_loss=0.05623, pruned_loss=0.009668, audio_tagging_loss=0.009619, over 15632.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08947, pruned_loss=0.0122, audio_tagging_loss=0.008461, over 3041832.97 frames. ], batch size: 65, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:03:09,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-11-28 09:03:14,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-11-28 09:03:38,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-11-28 09:03:43,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3433940.0, ans=0.0 2023-11-28 09:03:51,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-28 09:03:55,266 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10100, loss[loss=0.06198, simple_loss=0.08682, pruned_loss=0.009349, audio_tagging_loss=0.009227, over 15348.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0903, pruned_loss=0.01236, audio_tagging_loss=0.008529, over 3046866.33 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:03:56,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3434006.6666666665, ans=0.125 2023-11-28 09:04:11,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3434073.3333333335, ans=0.2 2023-11-28 09:04:13,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.780e+01 9.411e+01 9.939e+01 1.267e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 09:04:15,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3434073.3333333335, ans=0.0 2023-11-28 09:04:24,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3434140.0, ans=0.125 2023-11-28 09:04:28,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-28 09:04:45,962 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:04:49,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-28 09:04:53,045 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10150, loss[loss=0.06998, simple_loss=0.08903, pruned_loss=0.01557, audio_tagging_loss=0.009895, over 16656.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09035, pruned_loss=0.01226, audio_tagging_loss=0.008502, over 3053899.87 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:53,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-28 09:05:00,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3434340.0, ans=0.0 2023-11-28 09:05:15,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-28 09:05:23,668 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:05:25,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-28 09:05:27,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.69 vs. limit=15.0 2023-11-28 09:05:30,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3434540.0, ans=0.1 2023-11-28 09:05:32,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3434540.0, ans=0.1 2023-11-28 09:05:45,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-28 09:05:49,223 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10200, loss[loss=0.07205, simple_loss=0.1008, pruned_loss=0.01314, audio_tagging_loss=0.008495, over 16430.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08983, pruned_loss=0.01217, audio_tagging_loss=0.008585, over 3051123.50 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:05:52,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3434673.3333333335, ans=0.0 2023-11-28 09:06:06,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3434740.0, ans=0.125 2023-11-28 09:06:08,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.883e+01 9.493e+01 1.013e+02 1.248e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:06:14,798 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:06:19,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3434806.6666666665, ans=0.2 2023-11-28 09:06:20,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3434806.6666666665, ans=0.125 2023-11-28 09:06:33,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2023-11-28 09:06:33,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3434940.0, ans=0.0 2023-11-28 09:06:43,212 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-28 09:06:45,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3435006.6666666665, ans=0.025 2023-11-28 09:06:46,373 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10250, loss[loss=0.05364, simple_loss=0.07, pruned_loss=0.008497, audio_tagging_loss=0.01014, over 14542.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09012, pruned_loss=0.01224, audio_tagging_loss=0.008605, over 3059153.75 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:07:00,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3435073.3333333335, ans=0.125 2023-11-28 09:07:13,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-28 09:07:28,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2023-11-28 09:07:40,789 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-28 09:07:40,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3435273.3333333335, ans=0.1 2023-11-28 09:07:44,048 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10300, loss[loss=0.07559, simple_loss=0.1039, pruned_loss=0.01376, audio_tagging_loss=0.009901, over 16246.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08941, pruned_loss=0.01221, audio_tagging_loss=0.008774, over 3054883.02 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:08:01,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.048e+01 9.599e+01 1.061e+02 1.681e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 09:08:17,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-28 09:08:37,145 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-28 09:08:40,331 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10350, loss[loss=0.0929, simple_loss=0.1332, pruned_loss=0.0192, audio_tagging_loss=0.007093, over 15311.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0897, pruned_loss=0.01223, audio_tagging_loss=0.008829, over 3051069.25 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:08:54,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3435740.0, ans=0.125 2023-11-28 09:09:00,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3435740.0, ans=0.125 2023-11-28 09:09:05,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3435806.6666666665, ans=0.0 2023-11-28 09:09:07,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3435806.6666666665, ans=15.0 2023-11-28 09:09:17,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3435873.3333333335, ans=0.0 2023-11-28 09:09:28,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3435940.0, ans=0.0 2023-11-28 09:09:33,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-28 09:09:36,952 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10400, loss[loss=0.06339, simple_loss=0.0864, pruned_loss=0.009322, audio_tagging_loss=0.01087, over 15966.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08954, pruned_loss=0.01223, audio_tagging_loss=0.008992, over 3047755.21 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:09:54,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.993e+01 9.634e+01 1.025e+02 1.288e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 09:10:06,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3436140.0, ans=0.125 2023-11-28 09:10:07,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3436140.0, ans=0.0 2023-11-28 09:10:17,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3436206.6666666665, ans=0.0 2023-11-28 09:10:28,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3436273.3333333335, ans=0.0 2023-11-28 09:10:29,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.87 vs. limit=5.0 2023-11-28 09:10:30,048 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-28 09:10:33,230 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10450, loss[loss=0.0846, simple_loss=0.1264, pruned_loss=0.01423, audio_tagging_loss=0.007151, over 14896.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0897, pruned_loss=0.01218, audio_tagging_loss=0.008947, over 3048321.84 frames. ], batch size: 52, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:10:36,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-11-28 09:10:51,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3436406.6666666665, ans=0.125 2023-11-28 09:10:51,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3436406.6666666665, ans=0.1 2023-11-28 09:10:58,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3436473.3333333335, ans=0.125 2023-11-28 09:11:07,494 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:11:26,960 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-28 09:11:30,124 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10500, loss[loss=0.06434, simple_loss=0.07435, pruned_loss=0.01605, audio_tagging_loss=0.01111, over 15176.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08953, pruned_loss=0.01217, audio_tagging_loss=0.00885, over 3050319.29 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:11:48,943 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.855e+01 9.374e+01 1.019e+02 1.300e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 09:11:55,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-28 09:12:06,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3436873.3333333335, ans=0.09899494936611666 2023-11-28 09:12:24,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3436940.0, ans=0.125 2023-11-28 09:12:25,159 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-28 09:12:27,453 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:12:28,320 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10550, loss[loss=0.06998, simple_loss=0.1025, pruned_loss=0.009551, audio_tagging_loss=0.009158, over 15749.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09044, pruned_loss=0.01243, audio_tagging_loss=0.008704, over 3051999.28 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:12:31,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3437006.6666666665, ans=0.1 2023-11-28 09:12:44,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437073.3333333335, ans=0.1 2023-11-28 09:13:20,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3437273.3333333335, ans=0.2 2023-11-28 09:13:21,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-28 09:13:25,232 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10600, loss[loss=0.05903, simple_loss=0.07925, pruned_loss=0.01204, audio_tagging_loss=0.007363, over 17150.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09155, pruned_loss=0.01262, audio_tagging_loss=0.0086, over 3062231.35 frames. ], batch size: 64, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:13:27,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3437340.0, ans=0.1 2023-11-28 09:13:31,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-28 09:13:38,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3437406.6666666665, ans=0.125 2023-11-28 09:13:42,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 9.109e+01 9.906e+01 1.072e+02 1.462e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-28 09:14:03,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-28 09:14:03,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3437540.0, ans=0.125 2023-11-28 09:14:17,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-28 09:14:21,254 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10650, loss[loss=0.06444, simple_loss=0.09247, pruned_loss=0.01158, audio_tagging_loss=0.006621, over 14928.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09046, pruned_loss=0.01261, audio_tagging_loss=0.008637, over 3058858.63 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:14:21,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-28 09:14:25,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3437673.3333333335, ans=0.0 2023-11-28 09:14:26,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3437673.3333333335, ans=0.125 2023-11-28 09:14:27,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-28 09:14:34,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3437740.0, ans=0.125 2023-11-28 09:14:45,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3437806.6666666665, ans=0.0 2023-11-28 09:15:02,919 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:15:08,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3437940.0, ans=0.125 2023-11-28 09:15:13,624 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-28 09:15:15,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3437940.0, ans=10.0 2023-11-28 09:15:17,421 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10700, loss[loss=0.06223, simple_loss=0.08541, pruned_loss=0.0096, audio_tagging_loss=0.009926, over 15557.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09035, pruned_loss=0.01262, audio_tagging_loss=0.008589, over 3054129.77 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:15:33,721 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:15:36,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.910e+01 9.467e+01 1.013e+02 1.295e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 09:15:45,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3438140.0, ans=0.0 2023-11-28 09:15:50,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-28 09:15:52,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3438206.6666666665, ans=0.0 2023-11-28 09:16:10,831 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-28 09:16:13,979 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10750, loss[loss=0.05357, simple_loss=0.06509, pruned_loss=0.01102, audio_tagging_loss=0.01, over 14990.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09009, pruned_loss=0.01255, audio_tagging_loss=0.008601, over 3050849.84 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:16:14,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2023-11-28 09:16:16,417 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:16:44,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3438473.3333333335, ans=0.0 2023-11-28 09:16:46,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3438540.0, ans=0.125 2023-11-28 09:16:56,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3438540.0, ans=0.125 2023-11-28 09:17:06,600 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-28 09:17:10,049 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10800, loss[loss=0.06922, simple_loss=0.08936, pruned_loss=0.01626, audio_tagging_loss=0.00828, over 14668.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09035, pruned_loss=0.01259, audio_tagging_loss=0.008563, over 3053490.77 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:17:21,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3438740.0, ans=0.2 2023-11-28 09:17:29,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.659e+01 9.192e+01 9.823e+01 1.353e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-28 09:17:45,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3438873.3333333335, ans=0.0 2023-11-28 09:18:01,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3438940.0, ans=0.1 2023-11-28 09:18:02,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-28 09:18:06,530 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10850, loss[loss=0.0744, simple_loss=0.1001, pruned_loss=0.0174, audio_tagging_loss=0.006955, over 15392.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09122, pruned_loss=0.01259, audio_tagging_loss=0.008521, over 3057689.09 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:18:59,469 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-28 09:19:02,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3439340.0, ans=0.125 2023-11-28 09:19:03,240 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10900, loss[loss=0.05715, simple_loss=0.08205, pruned_loss=0.008675, audio_tagging_loss=0.007448, over 15997.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09117, pruned_loss=0.0126, audio_tagging_loss=0.008596, over 3058948.06 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:19:03,254 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:19:09,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3439340.0, ans=0.125 2023-11-28 09:19:09,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3439340.0, ans=0.07 2023-11-28 09:19:21,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.090e+01 9.658e+01 1.040e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 09:19:29,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3439473.3333333335, ans=0.2 2023-11-28 09:19:34,739 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:19:43,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-28 09:19:43,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2023-11-28 09:19:43,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3439540.0, ans=0.05 2023-11-28 09:19:52,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3439606.6666666665, ans=0.0 2023-11-28 09:19:56,330 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-28 09:19:59,470 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10950, loss[loss=0.07266, simple_loss=0.1057, pruned_loss=0.01239, audio_tagging_loss=0.007431, over 15170.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08988, pruned_loss=0.01234, audio_tagging_loss=0.008637, over 3053066.86 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:20:09,351 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:20:13,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 09:20:22,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3439806.6666666665, ans=10.0 2023-11-28 09:20:25,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-28 09:20:52,063 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-28 09:20:57,608 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11000, loss[loss=0.06104, simple_loss=0.08299, pruned_loss=0.01126, audio_tagging_loss=0.008284, over 16378.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08917, pruned_loss=0.01211, audio_tagging_loss=0.008735, over 3045928.11 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:20:58,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3440006.6666666665, ans=0.125 2023-11-28 09:21:03,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3440006.6666666665, ans=0.0 2023-11-28 09:21:10,788 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:21:17,808 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.606e+01 9.397e+01 9.983e+01 1.237e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 09:21:43,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3440273.3333333335, ans=0.125 2023-11-28 09:21:51,204 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-28 09:21:54,931 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11050, loss[loss=0.06932, simple_loss=0.09017, pruned_loss=0.01329, audio_tagging_loss=0.01095, over 15585.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08997, pruned_loss=0.01225, audio_tagging_loss=0.00876, over 3050722.20 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:22:21,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3440473.3333333335, ans=0.2 2023-11-28 09:22:32,291 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:22:42,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3440606.6666666665, ans=0.0 2023-11-28 09:22:48,681 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-28 09:22:52,004 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11100, loss[loss=0.06138, simple_loss=0.08113, pruned_loss=0.01007, audio_tagging_loss=0.01074, over 15168.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09004, pruned_loss=0.01226, audio_tagging_loss=0.008888, over 3051375.58 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:22:55,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3440673.3333333335, ans=0.125 2023-11-28 09:22:55,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3440673.3333333335, ans=0.125 2023-11-28 09:22:56,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3440673.3333333335, ans=0.125 2023-11-28 09:23:01,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3440673.3333333335, ans=0.0 2023-11-28 09:23:07,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-11-28 09:23:12,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.852e+01 9.435e+01 1.052e+02 1.493e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 09:23:41,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3440940.0, ans=0.035 2023-11-28 09:23:45,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-28 09:23:49,020 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11150, loss[loss=0.06784, simple_loss=0.09799, pruned_loss=0.009087, audio_tagging_loss=0.009752, over 14282.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08987, pruned_loss=0.01221, audio_tagging_loss=0.008901, over 3057801.66 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:23:49,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2023-11-28 09:23:49,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-28 09:23:50,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-11-28 09:23:51,367 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:23:53,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2023-11-28 09:24:15,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3441140.0, ans=0.2 2023-11-28 09:24:22,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=22.5 2023-11-28 09:24:24,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3441206.6666666665, ans=0.1 2023-11-28 09:24:25,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=8.0 2023-11-28 09:24:43,306 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-28 09:24:47,383 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11200, loss[loss=0.07751, simple_loss=0.1045, pruned_loss=0.01361, audio_tagging_loss=0.01166, over 14772.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08964, pruned_loss=0.01218, audio_tagging_loss=0.008986, over 3050463.43 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:25:07,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3441406.6666666665, ans=0.125 2023-11-28 09:25:07,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3441406.6666666665, ans=0.125 2023-11-28 09:25:07,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.684e+01 9.493e+01 1.049e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:25:12,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3441473.3333333335, ans=0.0 2023-11-28 09:25:27,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3441540.0, ans=0.125 2023-11-28 09:25:35,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441606.6666666665, ans=0.1 2023-11-28 09:25:39,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2023-11-28 09:25:41,294 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-28 09:25:41,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3441606.6666666665, ans=0.2 2023-11-28 09:25:44,997 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11250, loss[loss=0.06092, simple_loss=0.07441, pruned_loss=0.01178, audio_tagging_loss=0.01193, over 14698.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08953, pruned_loss=0.01224, audio_tagging_loss=0.008959, over 3055352.55 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:25:52,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-28 09:26:15,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3441806.6666666665, ans=0.125 2023-11-28 09:26:38,710 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-28 09:26:41,908 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11300, loss[loss=0.07539, simple_loss=0.105, pruned_loss=0.01688, audio_tagging_loss=0.006024, over 14843.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08994, pruned_loss=0.0123, audio_tagging_loss=0.008715, over 3054030.44 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:26:58,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442073.3333333335, ans=0.1 2023-11-28 09:27:00,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.59 vs. limit=10.0 2023-11-28 09:27:02,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.901e+01 9.622e+01 1.003e+02 2.071e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-28 09:27:20,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3442206.6666666665, ans=0.125 2023-11-28 09:27:24,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442206.6666666665, ans=0.1 2023-11-28 09:27:35,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-28 09:27:38,749 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11350, loss[loss=0.05913, simple_loss=0.09017, pruned_loss=0.007702, audio_tagging_loss=0.006341, over 14996.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08917, pruned_loss=0.01214, audio_tagging_loss=0.008607, over 3050155.75 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:27:45,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3442340.0, ans=15.0 2023-11-28 09:27:47,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3442340.0, ans=0.125 2023-11-28 09:28:01,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3442473.3333333335, ans=0.0 2023-11-28 09:28:32,956 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-28 09:28:35,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3442673.3333333335, ans=0.0 2023-11-28 09:28:36,501 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11400, loss[loss=0.06802, simple_loss=0.09546, pruned_loss=0.0122, audio_tagging_loss=0.008084, over 16506.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08903, pruned_loss=0.01206, audio_tagging_loss=0.008587, over 3046849.97 frames. ], batch size: 64, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:28:56,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.771e+01 9.196e+01 9.896e+01 1.286e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 09:29:09,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3442873.3333333335, ans=0.125 2023-11-28 09:29:11,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2023-11-28 09:29:15,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3442873.3333333335, ans=0.2 2023-11-28 09:29:17,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3442873.3333333335, ans=15.0 2023-11-28 09:29:30,204 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-28 09:29:33,425 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11450, loss[loss=0.08168, simple_loss=0.1171, pruned_loss=0.01594, audio_tagging_loss=0.00718, over 15332.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08898, pruned_loss=0.01205, audio_tagging_loss=0.00856, over 3043902.23 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:29:58,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3443140.0, ans=0.125 2023-11-28 09:30:00,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-28 09:30:18,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-28 09:30:27,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-28 09:30:27,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-28 09:30:30,978 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11500, loss[loss=0.06456, simple_loss=0.09156, pruned_loss=0.01234, audio_tagging_loss=0.00644, over 16112.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08886, pruned_loss=0.01209, audio_tagging_loss=0.008602, over 3044864.39 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:30:33,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3443340.0, ans=0.02 2023-11-28 09:30:37,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3443340.0, ans=0.125 2023-11-28 09:30:39,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3443340.0, ans=0.2 2023-11-28 09:30:52,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.608e+01 9.367e+01 9.940e+01 1.192e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 09:30:59,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3443473.3333333335, ans=0.125 2023-11-28 09:31:25,535 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-28 09:31:26,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443606.6666666665, ans=0.1 2023-11-28 09:31:28,717 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11550, loss[loss=0.06648, simple_loss=0.09669, pruned_loss=0.01172, audio_tagging_loss=0.006419, over 15200.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08932, pruned_loss=0.01219, audio_tagging_loss=0.008553, over 3049982.06 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:32:06,696 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:32:21,768 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-28 09:32:25,238 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11600, loss[loss=0.07338, simple_loss=0.1046, pruned_loss=0.01391, audio_tagging_loss=0.007166, over 15186.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08932, pruned_loss=0.01209, audio_tagging_loss=0.008567, over 3046930.06 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:32:47,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.769e+01 9.333e+01 1.033e+02 1.788e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 09:32:54,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-28 09:33:04,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3444206.6666666665, ans=0.2 2023-11-28 09:33:14,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2023-11-28 09:33:18,719 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-28 09:33:22,536 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11650, loss[loss=0.07589, simple_loss=0.1054, pruned_loss=0.01605, audio_tagging_loss=0.007147, over 14816.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08955, pruned_loss=0.01217, audio_tagging_loss=0.008614, over 3042529.18 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:33:35,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3444406.6666666665, ans=0.125 2023-11-28 09:33:49,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-28 09:33:57,707 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:34:17,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-28 09:34:20,380 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11700, loss[loss=0.07349, simple_loss=0.1006, pruned_loss=0.01372, audio_tagging_loss=0.009451, over 15652.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08894, pruned_loss=0.01222, audio_tagging_loss=0.008663, over 3036681.63 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:34:20,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3444673.3333333335, ans=0.125 2023-11-28 09:34:22,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3444673.3333333335, ans=0.125 2023-11-28 09:34:42,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.763e+01 9.224e+01 1.034e+02 1.340e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 09:34:44,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3444806.6666666665, ans=0.0 2023-11-28 09:35:12,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-28 09:35:14,282 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-28 09:35:16,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2023-11-28 09:35:17,411 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11750, loss[loss=0.07176, simple_loss=0.09762, pruned_loss=0.01536, audio_tagging_loss=0.007594, over 15228.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08876, pruned_loss=0.01235, audio_tagging_loss=0.008681, over 3046088.21 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:35:20,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3445006.6666666665, ans=0.125 2023-11-28 09:35:25,184 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:35:34,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-28 09:35:51,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3445206.6666666665, ans=0.95 2023-11-28 09:36:07,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3445273.3333333335, ans=0.125 2023-11-28 09:36:10,204 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-28 09:36:11,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3445273.3333333335, ans=0.2 2023-11-28 09:36:14,212 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11800, loss[loss=0.04876, simple_loss=0.06485, pruned_loss=0.008269, audio_tagging_loss=0.008067, over 14760.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08817, pruned_loss=0.01214, audio_tagging_loss=0.008759, over 3049233.00 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:36:18,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3445340.0, ans=0.125 2023-11-28 09:36:25,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3445406.6666666665, ans=0.125 2023-11-28 09:36:34,273 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:36:37,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.864e+01 9.665e+01 1.018e+02 1.283e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 09:36:39,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3445473.3333333335, ans=0.2 2023-11-28 09:36:43,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445473.3333333335, ans=0.125 2023-11-28 09:36:59,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3445606.6666666665, ans=0.125 2023-11-28 09:36:59,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3445606.6666666665, ans=0.0 2023-11-28 09:37:08,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-28 09:37:08,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3445606.6666666665, ans=0.125 2023-11-28 09:37:12,340 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11850, loss[loss=0.04926, simple_loss=0.06383, pruned_loss=0.007798, audio_tagging_loss=0.009549, over 15738.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08862, pruned_loss=0.01232, audio_tagging_loss=0.008792, over 3045250.38 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:37:23,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445740.0, ans=0.1 2023-11-28 09:37:32,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3445740.0, ans=0.125 2023-11-28 09:37:37,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-28 09:37:42,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3445806.6666666665, ans=0.125 2023-11-28 09:37:49,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-11-28 09:37:58,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-28 09:38:04,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-28 09:38:06,185 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-28 09:38:09,366 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11900, loss[loss=0.06909, simple_loss=0.09181, pruned_loss=0.01189, audio_tagging_loss=0.01129, over 16239.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08908, pruned_loss=0.01229, audio_tagging_loss=0.008877, over 3043952.38 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:38:14,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3446006.6666666665, ans=0.125 2023-11-28 09:38:32,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.705e+01 9.389e+01 1.010e+02 1.284e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 09:38:39,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-28 09:38:44,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-28 09:38:53,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3446206.6666666665, ans=0.09899494936611666 2023-11-28 09:38:58,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3446273.3333333335, ans=0.1 2023-11-28 09:38:59,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.81 vs. limit=10.0 2023-11-28 09:38:59,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3446273.3333333335, ans=0.1 2023-11-28 09:39:03,008 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-28 09:39:06,129 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11950, loss[loss=0.06845, simple_loss=0.1006, pruned_loss=0.01154, audio_tagging_loss=0.006614, over 16338.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08921, pruned_loss=0.01225, audio_tagging_loss=0.008841, over 3049340.74 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:39:23,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2023-11-28 09:39:43,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3446540.0, ans=0.05 2023-11-28 09:39:43,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3446540.0, ans=0.04949747468305833 2023-11-28 09:39:58,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-28 09:40:01,987 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 12000, loss[loss=0.05591, simple_loss=0.07243, pruned_loss=0.008815, audio_tagging_loss=0.01088, over 15487.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08992, pruned_loss=0.01233, audio_tagging_loss=0.00897, over 3048876.77 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:40:01,988 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 09:40:14,947 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.7833, 3.1004, 2.8660, 3.4764, 3.1321, 3.0351, 3.2867, 3.0062], device='cuda:3') 2023-11-28 09:40:36,959 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05826, simple_loss=0.05053, pruned_loss=0.005231, audio_tagging_loss=0.02777, over 4681554.00 frames. 2023-11-28 09:40:36,960 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 09:40:57,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.981e+01 9.596e+01 1.044e+02 1.233e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 09:41:18,033 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 0, loss[loss=0.06931, simple_loss=0.07965, pruned_loss=0.009279, audio_tagging_loss=0.0202, over 14977.00 frames. ], tot_loss[loss=0.06931, simple_loss=0.07965, pruned_loss=0.009279, audio_tagging_loss=0.0202, over 14977.00 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:41:18,034 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 09:41:43,487 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9749, 5.8643, 5.6424, 5.5426], device='cuda:3') 2023-11-28 09:41:48,514 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8021, 5.8308, 5.8928, 5.8721], device='cuda:3') 2023-11-28 09:41:52,339 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05791, simple_loss=0.05054, pruned_loss=0.00521, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 09:41:52,339 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 09:41:58,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446840.0, ans=0.1 2023-11-28 09:42:13,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446906.6666666665, ans=0.1 2023-11-28 09:42:18,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-28 09:42:19,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3446973.3333333335, ans=0.125 2023-11-28 09:42:46,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3447106.6666666665, ans=0.0 2023-11-28 09:42:50,888 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 50, loss[loss=0.06441, simple_loss=0.08138, pruned_loss=0.009748, audio_tagging_loss=0.01397, over 15079.00 frames. ], tot_loss[loss=0.07225, simple_loss=0.08854, pruned_loss=0.01133, audio_tagging_loss=0.01665, over 686710.86 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:42:51,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3447173.3333333335, ans=0.125 2023-11-28 09:43:04,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3447240.0, ans=0.2 2023-11-28 09:43:16,877 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-28 09:43:25,526 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:43:38,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3447440.0, ans=0.125 2023-11-28 09:43:44,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.529e+01 9.824e+01 1.052e+02 1.128e+02 1.642e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-28 09:43:50,421 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 100, loss[loss=0.08789, simple_loss=0.1154, pruned_loss=0.01568, audio_tagging_loss=0.01453, over 15498.00 frames. ], tot_loss[loss=0.0723, simple_loss=0.08833, pruned_loss=0.01189, audio_tagging_loss=0.01624, over 1204782.84 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:44:11,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-28 09:44:15,362 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-28 09:44:20,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3447640.0, ans=0.125 2023-11-28 09:44:29,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447706.6666666665, ans=0.1 2023-11-28 09:44:31,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3447706.6666666665, ans=0.2 2023-11-28 09:44:46,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3447840.0, ans=0.125 2023-11-28 09:44:47,931 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 150, loss[loss=0.07236, simple_loss=0.09554, pruned_loss=0.01242, audio_tagging_loss=0.01217, over 15772.00 frames. ], tot_loss[loss=0.07187, simple_loss=0.09004, pruned_loss=0.01225, audio_tagging_loss=0.01459, over 1613496.35 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:44:48,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.71 vs. limit=22.5 2023-11-28 09:44:59,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-28 09:45:02,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-28 09:45:14,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-28 09:45:15,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-28 09:45:17,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3447973.3333333335, ans=0.125 2023-11-28 09:45:29,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3448040.0, ans=0.0 2023-11-28 09:45:40,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3448106.6666666665, ans=0.125 2023-11-28 09:45:41,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 9.000e+01 9.478e+01 1.042e+02 1.328e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 09:45:46,283 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 200, loss[loss=0.07558, simple_loss=0.107, pruned_loss=0.01322, audio_tagging_loss=0.00884, over 15794.00 frames. ], tot_loss[loss=0.07026, simple_loss=0.09052, pruned_loss=0.01229, audio_tagging_loss=0.01272, over 1927014.07 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:46:06,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=12.0 2023-11-28 09:46:11,902 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-28 09:46:15,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2023-11-28 09:46:24,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3448373.3333333335, ans=0.0 2023-11-28 09:46:30,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3448373.3333333335, ans=0.125 2023-11-28 09:46:43,885 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 250, loss[loss=0.05767, simple_loss=0.07784, pruned_loss=0.01115, audio_tagging_loss=0.007604, over 14893.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09066, pruned_loss=0.01236, audio_tagging_loss=0.01158, over 2171412.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:46:54,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2023-11-28 09:47:09,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-28 09:47:09,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3448640.0, ans=0.125 2023-11-28 09:47:14,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3448640.0, ans=0.02 2023-11-28 09:47:22,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-11-28 09:47:36,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 9.287e+01 9.816e+01 1.058e+02 1.436e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-28 09:47:41,510 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 300, loss[loss=0.07194, simple_loss=0.09015, pruned_loss=0.01875, audio_tagging_loss=0.008111, over 15248.00 frames. ], tot_loss[loss=0.06887, simple_loss=0.09131, pruned_loss=0.01257, audio_tagging_loss=0.01064, over 2370012.11 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:47:42,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=12.0 2023-11-28 09:47:45,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3448840.0, ans=0.0 2023-11-28 09:47:48,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448840.0, ans=0.1 2023-11-28 09:48:07,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-28 09:48:09,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-28 09:48:11,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3448973.3333333335, ans=0.0 2023-11-28 09:48:29,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3449106.6666666665, ans=0.125 2023-11-28 09:48:39,228 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 350, loss[loss=0.06828, simple_loss=0.08713, pruned_loss=0.01604, audio_tagging_loss=0.008667, over 14317.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09016, pruned_loss=0.01237, audio_tagging_loss=0.01007, over 2515409.87 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:48:51,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3449240.0, ans=0.1 2023-11-28 09:48:53,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3449240.0, ans=0.0 2023-11-28 09:48:56,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3449240.0, ans=0.125 2023-11-28 09:49:02,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3449306.6666666665, ans=0.2 2023-11-28 09:49:04,261 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-28 09:49:18,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3449373.3333333335, ans=0.125 2023-11-28 09:49:32,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.082e+01 9.709e+01 1.033e+02 1.269e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 09:49:37,653 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 400, loss[loss=0.08127, simple_loss=0.1097, pruned_loss=0.0179, audio_tagging_loss=0.008504, over 14362.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08898, pruned_loss=0.01228, audio_tagging_loss=0.009684, over 2629110.60 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:49:54,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3449573.3333333335, ans=0.125 2023-11-28 09:50:03,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-28 09:50:26,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3449773.3333333335, ans=0.1 2023-11-28 09:50:27,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3449773.3333333335, ans=0.125 2023-11-28 09:50:34,882 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 450, loss[loss=0.05608, simple_loss=0.07489, pruned_loss=0.009504, audio_tagging_loss=0.009133, over 14937.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.0898, pruned_loss=0.01233, audio_tagging_loss=0.00946, over 2715069.07 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:50:36,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3449840.0, ans=0.0 2023-11-28 09:50:36,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3449840.0, ans=0.2 2023-11-28 09:50:48,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2023-11-28 09:51:00,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-28 09:51:00,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3449973.3333333335, ans=0.125 2023-11-28 09:51:08,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3450040.0, ans=0.125 2023-11-28 09:51:28,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.576e+01 9.362e+01 1.011e+02 1.317e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 09:51:32,722 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 500, loss[loss=0.07526, simple_loss=0.1156, pruned_loss=0.009949, audio_tagging_loss=0.00751, over 16284.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08966, pruned_loss=0.01219, audio_tagging_loss=0.009226, over 2784748.16 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:51:34,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3450173.3333333335, ans=0.0 2023-11-28 09:51:37,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3450173.3333333335, ans=0.125 2023-11-28 09:51:50,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3450240.0, ans=0.125 2023-11-28 09:51:50,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2023-11-28 09:51:51,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3450240.0, ans=0.2 2023-11-28 09:51:52,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3450240.0, ans=0.1 2023-11-28 09:51:58,192 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-28 09:52:06,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450373.3333333335, ans=0.1 2023-11-28 09:52:09,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450373.3333333335, ans=0.1 2023-11-28 09:52:12,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3450373.3333333335, ans=0.0 2023-11-28 09:52:30,044 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 550, loss[loss=0.06633, simple_loss=0.09916, pruned_loss=0.008025, audio_tagging_loss=0.008729, over 14469.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08978, pruned_loss=0.01229, audio_tagging_loss=0.009145, over 2839451.39 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:52:41,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3450573.3333333335, ans=0.125 2023-11-28 09:52:45,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-28 09:52:46,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2023-11-28 09:52:55,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-28 09:53:24,170 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.868e+01 9.461e+01 1.003e+02 1.214e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 09:53:27,496 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 600, loss[loss=0.05729, simple_loss=0.07529, pruned_loss=0.009193, audio_tagging_loss=0.01045, over 13594.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09023, pruned_loss=0.01242, audio_tagging_loss=0.009053, over 2886151.73 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:53:44,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3450906.6666666665, ans=0.125 2023-11-28 09:53:49,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-28 09:53:53,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-28 09:54:02,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3451040.0, ans=0.125 2023-11-28 09:54:13,011 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:54:17,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-28 09:54:22,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3451106.6666666665, ans=0.2 2023-11-28 09:54:25,026 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 650, loss[loss=0.06932, simple_loss=0.09623, pruned_loss=0.01397, audio_tagging_loss=0.007233, over 15402.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09, pruned_loss=0.01245, audio_tagging_loss=0.009024, over 2920581.64 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:54:30,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3451173.3333333335, ans=0.035 2023-11-28 09:54:40,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3451240.0, ans=0.125 2023-11-28 09:54:44,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-28 09:54:44,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3451240.0, ans=0.125 2023-11-28 09:54:47,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2023-11-28 09:54:50,113 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-28 09:54:53,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3451306.6666666665, ans=0.2 2023-11-28 09:54:57,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3451373.3333333335, ans=0.125 2023-11-28 09:55:07,402 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:55:18,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.000e+01 9.495e+01 1.012e+02 1.235e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:55:21,740 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 700, loss[loss=0.06205, simple_loss=0.07341, pruned_loss=0.01296, audio_tagging_loss=0.01238, over 15542.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09055, pruned_loss=0.01244, audio_tagging_loss=0.008834, over 2952809.01 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:55:46,366 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-28 09:55:55,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3451706.6666666665, ans=0.0 2023-11-28 09:56:08,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3451773.3333333335, ans=0.125 2023-11-28 09:56:18,698 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 750, loss[loss=0.09022, simple_loss=0.1184, pruned_loss=0.02274, audio_tagging_loss=0.008263, over 15328.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09017, pruned_loss=0.01247, audio_tagging_loss=0.00885, over 2976883.41 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:56:24,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3451840.0, ans=0.125 2023-11-28 09:56:36,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3451906.6666666665, ans=0.125 2023-11-28 09:56:42,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2023-11-28 09:56:44,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-28 09:56:54,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3452040.0, ans=0.125 2023-11-28 09:57:13,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.892e+01 9.576e+01 1.074e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 09:57:15,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3452173.3333333335, ans=0.0 2023-11-28 09:57:15,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3452173.3333333335, ans=0.125 2023-11-28 09:57:16,397 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 800, loss[loss=0.04946, simple_loss=0.06337, pruned_loss=0.008403, audio_tagging_loss=0.009372, over 14563.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09114, pruned_loss=0.0126, audio_tagging_loss=0.008896, over 2992904.26 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:57:16,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3452173.3333333335, ans=0.05 2023-11-28 09:57:30,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-28 09:57:30,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3452240.0, ans=0.2 2023-11-28 09:57:42,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-28 09:57:53,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3452373.3333333335, ans=0.125 2023-11-28 09:58:12,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3452440.0, ans=0.1 2023-11-28 09:58:14,604 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 850, loss[loss=0.08309, simple_loss=0.1192, pruned_loss=0.01649, audio_tagging_loss=0.006985, over 15348.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.0907, pruned_loss=0.01257, audio_tagging_loss=0.008934, over 3007034.70 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:58:15,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3452506.6666666665, ans=0.2 2023-11-28 09:58:39,970 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-28 09:58:48,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3452706.6666666665, ans=0.0 2023-11-28 09:59:10,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.934e+01 9.404e+01 1.018e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 09:59:11,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-28 09:59:13,128 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 900, loss[loss=0.06001, simple_loss=0.08007, pruned_loss=0.01249, audio_tagging_loss=0.007486, over 15847.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09063, pruned_loss=0.01251, audio_tagging_loss=0.008987, over 3009289.16 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:59:23,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-28 09:59:37,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-28 10:00:08,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-11-28 10:00:09,550 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 950, loss[loss=0.0777, simple_loss=0.09989, pruned_loss=0.01654, audio_tagging_loss=0.01121, over 14349.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09107, pruned_loss=0.01252, audio_tagging_loss=0.008903, over 3016088.59 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:00:13,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3453173.3333333335, ans=0.1 2023-11-28 10:00:17,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3453173.3333333335, ans=0.125 2023-11-28 10:00:25,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3453240.0, ans=0.0 2023-11-28 10:00:28,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3453240.0, ans=0.0 2023-11-28 10:00:35,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-28 10:00:47,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453373.3333333335, ans=0.1 2023-11-28 10:00:48,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3453373.3333333335, ans=0.125 2023-11-28 10:00:49,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453373.3333333335, ans=0.1 2023-11-28 10:01:01,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3453440.0, ans=0.125 2023-11-28 10:01:03,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-11-28 10:01:05,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.698e+01 9.447e+01 1.001e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:01:07,029 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1000, loss[loss=0.07623, simple_loss=0.1012, pruned_loss=0.01941, audio_tagging_loss=0.006237, over 14564.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.0896, pruned_loss=0.0123, audio_tagging_loss=0.008775, over 3015917.54 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:01:17,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3453506.6666666665, ans=0.0 2023-11-28 10:01:32,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-28 10:01:33,678 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:01:40,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3453706.6666666665, ans=0.2 2023-11-28 10:01:54,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-28 10:01:56,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-28 10:01:59,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-28 10:02:03,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453773.3333333335, ans=0.1 2023-11-28 10:02:05,482 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1050, loss[loss=0.05972, simple_loss=0.07229, pruned_loss=0.01019, audio_tagging_loss=0.01339, over 15143.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08985, pruned_loss=0.0124, audio_tagging_loss=0.008714, over 3014550.57 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:02:26,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-28 10:02:30,882 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-28 10:02:36,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-28 10:02:52,966 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:02:54,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3454106.6666666665, ans=0.05 2023-11-28 10:03:01,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.979e+01 9.409e+01 9.986e+01 1.298e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:03:02,653 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1100, loss[loss=0.05001, simple_loss=0.06618, pruned_loss=0.009034, audio_tagging_loss=0.007889, over 15123.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08997, pruned_loss=0.01231, audio_tagging_loss=0.008627, over 3019182.25 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:03:08,620 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:03:16,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3454240.0, ans=0.125 2023-11-28 10:03:17,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3454240.0, ans=0.1 2023-11-28 10:03:20,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-28 10:03:21,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-28 10:03:28,439 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-28 10:03:28,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3454306.6666666665, ans=0.1 2023-11-28 10:03:38,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3454373.3333333335, ans=0.0 2023-11-28 10:03:47,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3454440.0, ans=0.125 2023-11-28 10:03:57,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3454440.0, ans=0.05 2023-11-28 10:03:59,614 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1150, loss[loss=0.05411, simple_loss=0.07337, pruned_loss=0.009163, audio_tagging_loss=0.008257, over 15842.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09066, pruned_loss=0.01247, audio_tagging_loss=0.008612, over 3025538.93 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:04:11,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.18 vs. limit=10.0 2023-11-28 10:04:15,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3454573.3333333335, ans=0.125 2023-11-28 10:04:24,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-28 10:04:29,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2023-11-28 10:04:29,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3454640.0, ans=0.0 2023-11-28 10:04:42,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3454706.6666666665, ans=0.125 2023-11-28 10:04:46,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3454773.3333333335, ans=0.125 2023-11-28 10:04:57,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.839e+01 9.353e+01 1.036e+02 1.275e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 10:04:58,226 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1200, loss[loss=0.0645, simple_loss=0.08941, pruned_loss=0.01155, audio_tagging_loss=0.008241, over 14981.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09004, pruned_loss=0.01261, audio_tagging_loss=0.00854, over 3031255.97 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:04:59,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3454840.0, ans=0.125 2023-11-28 10:05:00,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3454840.0, ans=0.125 2023-11-28 10:05:08,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3454906.6666666665, ans=0.2 2023-11-28 10:05:22,755 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-28 10:05:31,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-28 10:05:35,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3455040.0, ans=0.125 2023-11-28 10:05:37,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3455040.0, ans=0.0 2023-11-28 10:05:42,556 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:05:54,740 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1250, loss[loss=0.05297, simple_loss=0.07097, pruned_loss=0.008916, audio_tagging_loss=0.008569, over 14518.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08892, pruned_loss=0.01227, audio_tagging_loss=0.008501, over 3027156.24 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:05:59,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3455173.3333333335, ans=0.125 2023-11-28 10:06:03,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3455173.3333333335, ans=0.125 2023-11-28 10:06:18,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-28 10:06:20,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-28 10:06:27,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3455306.6666666665, ans=0.125 2023-11-28 10:06:50,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.649e+01 9.225e+01 9.865e+01 1.174e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 10:06:51,955 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1300, loss[loss=0.055, simple_loss=0.0853, pruned_loss=0.005698, audio_tagging_loss=0.00665, over 15254.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08929, pruned_loss=0.01222, audio_tagging_loss=0.008503, over 3028354.81 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:06:54,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3455506.6666666665, ans=0.2 2023-11-28 10:07:11,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3455573.3333333335, ans=0.1 2023-11-28 10:07:15,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3455640.0, ans=0.0 2023-11-28 10:07:15,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3455640.0, ans=0.125 2023-11-28 10:07:17,147 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-28 10:07:36,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3455773.3333333335, ans=0.1 2023-11-28 10:07:49,281 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1350, loss[loss=0.08809, simple_loss=0.1144, pruned_loss=0.02399, audio_tagging_loss=0.006882, over 15225.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08968, pruned_loss=0.01235, audio_tagging_loss=0.008576, over 3033386.40 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:08:09,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455906.6666666665, ans=0.1 2023-11-28 10:08:14,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-28 10:08:33,612 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:08:34,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-28 10:08:44,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3456106.6666666665, ans=0.2 2023-11-28 10:08:45,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.591e+01 9.504e+01 1.020e+02 1.211e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 10:08:46,233 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1400, loss[loss=0.06254, simple_loss=0.08299, pruned_loss=0.01167, audio_tagging_loss=0.009369, over 15432.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08923, pruned_loss=0.01238, audio_tagging_loss=0.00872, over 3036709.68 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:08:50,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3456173.3333333335, ans=0.04949747468305833 2023-11-28 10:09:11,818 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-28 10:09:12,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=3456306.6666666665, ans=15.0 2023-11-28 10:09:26,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3456373.3333333335, ans=0.1 2023-11-28 10:09:37,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3456440.0, ans=0.0 2023-11-28 10:09:40,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-28 10:09:43,528 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1450, loss[loss=0.07685, simple_loss=0.1056, pruned_loss=0.01687, audio_tagging_loss=0.007173, over 15162.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08933, pruned_loss=0.01234, audio_tagging_loss=0.008745, over 3043219.76 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:09:49,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3456506.6666666665, ans=0.125 2023-11-28 10:10:04,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.11 vs. limit=10.0 2023-11-28 10:10:07,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3456640.0, ans=0.0 2023-11-28 10:10:08,630 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-28 10:10:11,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3456640.0, ans=0.2 2023-11-28 10:10:13,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-28 10:10:19,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3456706.6666666665, ans=0.2 2023-11-28 10:10:19,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3456706.6666666665, ans=0.2 2023-11-28 10:10:39,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.920e+01 9.408e+01 1.027e+02 1.400e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:10:41,241 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1500, loss[loss=0.05375, simple_loss=0.07981, pruned_loss=0.008497, audio_tagging_loss=0.005351, over 14860.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08945, pruned_loss=0.01222, audio_tagging_loss=0.008723, over 3051650.18 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:10:43,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3456840.0, ans=0.0 2023-11-28 10:10:43,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3456840.0, ans=0.125 2023-11-28 10:10:52,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3456906.6666666665, ans=0.1 2023-11-28 10:10:54,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3456906.6666666665, ans=0.1 2023-11-28 10:11:06,398 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-28 10:11:06,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3456973.3333333335, ans=0.0 2023-11-28 10:11:10,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-28 10:11:25,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3457040.0, ans=0.125 2023-11-28 10:11:29,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-28 10:11:33,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-28 10:11:37,957 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1550, loss[loss=0.07715, simple_loss=0.1082, pruned_loss=0.01423, audio_tagging_loss=0.008828, over 15448.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09009, pruned_loss=0.01232, audio_tagging_loss=0.008817, over 3047519.72 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:11:39,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3457173.3333333335, ans=0.2 2023-11-28 10:11:45,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3457173.3333333335, ans=0.0 2023-11-28 10:11:48,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3457240.0, ans=0.125 2023-11-28 10:11:52,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3457240.0, ans=0.125 2023-11-28 10:11:52,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3457240.0, ans=0.04949747468305833 2023-11-28 10:12:03,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-28 10:12:10,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3457306.6666666665, ans=0.125 2023-11-28 10:12:14,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3457373.3333333335, ans=0.125 2023-11-28 10:12:16,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3457373.3333333335, ans=0.0 2023-11-28 10:12:17,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3457373.3333333335, ans=0.125 2023-11-28 10:12:20,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-28 10:12:35,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.956e+01 9.382e+01 1.022e+02 1.472e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 10:12:36,218 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1600, loss[loss=0.05808, simple_loss=0.07862, pruned_loss=0.01108, audio_tagging_loss=0.00769, over 14404.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0902, pruned_loss=0.01242, audio_tagging_loss=0.008801, over 3038899.09 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:12:38,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3457506.6666666665, ans=0.07 2023-11-28 10:12:41,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-28 10:12:46,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3457573.3333333335, ans=0.0 2023-11-28 10:12:53,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3457573.3333333335, ans=0.2 2023-11-28 10:13:01,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-28 10:13:05,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3457640.0, ans=0.125 2023-11-28 10:13:27,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3457773.3333333335, ans=0.2 2023-11-28 10:13:31,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3457773.3333333335, ans=0.125 2023-11-28 10:13:33,651 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1650, loss[loss=0.06551, simple_loss=0.09561, pruned_loss=0.008508, audio_tagging_loss=0.009198, over 16092.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08946, pruned_loss=0.01229, audio_tagging_loss=0.008878, over 3040550.24 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:13:43,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3457840.0, ans=0.125 2023-11-28 10:13:58,871 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-28 10:14:11,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3458040.0, ans=0.0 2023-11-28 10:14:23,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-28 10:14:24,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3458106.6666666665, ans=0.0 2023-11-28 10:14:26,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2023-11-28 10:14:29,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3458106.6666666665, ans=0.1 2023-11-28 10:14:30,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.751e+01 9.360e+01 1.005e+02 1.461e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 10:14:31,136 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1700, loss[loss=0.07058, simple_loss=0.09746, pruned_loss=0.01228, audio_tagging_loss=0.009575, over 15682.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08931, pruned_loss=0.0123, audio_tagging_loss=0.008995, over 3050850.36 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:14:37,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-28 10:14:53,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3458306.6666666665, ans=0.125 2023-11-28 10:14:55,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-28 10:14:56,391 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-28 10:15:28,835 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1750, loss[loss=0.05284, simple_loss=0.07664, pruned_loss=0.008364, audio_tagging_loss=0.006157, over 14048.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08979, pruned_loss=0.01236, audio_tagging_loss=0.008871, over 3049483.69 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:15:46,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2023-11-28 10:15:54,025 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-28 10:16:04,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2023-11-28 10:16:08,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3458706.6666666665, ans=0.125 2023-11-28 10:16:20,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3458773.3333333335, ans=0.2 2023-11-28 10:16:25,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.578e+01 9.174e+01 9.766e+01 1.256e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-28 10:16:25,452 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1800, loss[loss=0.07589, simple_loss=0.1058, pruned_loss=0.01833, audio_tagging_loss=0.00465, over 15139.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08871, pruned_loss=0.01226, audio_tagging_loss=0.008803, over 3047157.28 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:16:28,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3458840.0, ans=0.0 2023-11-28 10:16:39,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3458906.6666666665, ans=0.0 2023-11-28 10:16:50,460 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-28 10:16:56,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3458973.3333333335, ans=0.0 2023-11-28 10:17:07,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3459040.0, ans=0.07 2023-11-28 10:17:22,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-28 10:17:23,166 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1850, loss[loss=0.06463, simple_loss=0.09209, pruned_loss=0.01084, audio_tagging_loss=0.007743, over 14818.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08977, pruned_loss=0.01233, audio_tagging_loss=0.008682, over 3048452.33 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:17:47,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-28 10:17:50,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459306.6666666665, ans=0.1 2023-11-28 10:17:52,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-28 10:18:12,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-28 10:18:14,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3459440.0, ans=0.125 2023-11-28 10:18:14,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2023-11-28 10:18:17,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3459440.0, ans=0.125 2023-11-28 10:18:19,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.665e+01 9.197e+01 1.005e+02 1.247e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 10:18:19,452 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1900, loss[loss=0.06432, simple_loss=0.08592, pruned_loss=0.01373, audio_tagging_loss=0.007635, over 13721.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08896, pruned_loss=0.01224, audio_tagging_loss=0.008619, over 3049991.24 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:18:20,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459506.6666666665, ans=0.1 2023-11-28 10:18:40,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3459573.3333333335, ans=0.125 2023-11-28 10:18:41,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3459573.3333333335, ans=0.0 2023-11-28 10:18:45,646 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-28 10:18:49,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3459640.0, ans=0.5 2023-11-28 10:18:52,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3459640.0, ans=0.0 2023-11-28 10:19:04,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3459773.3333333335, ans=0.2 2023-11-28 10:19:16,890 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1950, loss[loss=0.06165, simple_loss=0.08879, pruned_loss=0.00961, audio_tagging_loss=0.007638, over 15146.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08789, pruned_loss=0.01205, audio_tagging_loss=0.008707, over 3045838.94 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:19:25,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3459840.0, ans=0.0 2023-11-28 10:19:32,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3459906.6666666665, ans=0.0 2023-11-28 10:19:34,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-28 10:19:41,691 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-28 10:19:44,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3459973.3333333335, ans=0.0 2023-11-28 10:20:12,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3460106.6666666665, ans=0.07 2023-11-28 10:20:14,530 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.984e+01 9.500e+01 1.035e+02 1.289e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 10:20:14,557 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2000, loss[loss=0.07524, simple_loss=0.0996, pruned_loss=0.01555, audio_tagging_loss=0.009887, over 16508.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08737, pruned_loss=0.01207, audio_tagging_loss=0.008721, over 3049012.32 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:20:22,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3460173.3333333335, ans=0.125 2023-11-28 10:20:39,440 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-28 10:21:02,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3460440.0, ans=0.0 2023-11-28 10:21:11,328 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2050, loss[loss=0.06964, simple_loss=0.1006, pruned_loss=0.01168, audio_tagging_loss=0.007649, over 15824.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08817, pruned_loss=0.01204, audio_tagging_loss=0.008635, over 3040983.05 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:21:13,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3460506.6666666665, ans=0.125 2023-11-28 10:21:13,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3460506.6666666665, ans=0.1 2023-11-28 10:21:15,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460506.6666666665, ans=0.1 2023-11-28 10:21:18,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3460506.6666666665, ans=0.2 2023-11-28 10:21:28,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=22.5 2023-11-28 10:21:38,249 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-28 10:21:39,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3460640.0, ans=0.125 2023-11-28 10:21:50,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2023-11-28 10:22:01,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3460773.3333333335, ans=0.125 2023-11-28 10:22:09,701 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2100, loss[loss=0.06356, simple_loss=0.09281, pruned_loss=0.01112, audio_tagging_loss=0.006035, over 14206.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08753, pruned_loss=0.01193, audio_tagging_loss=0.008619, over 3036332.23 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:22:10,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.721e+01 9.366e+01 1.002e+02 1.628e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 10:22:33,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3460973.3333333335, ans=0.125 2023-11-28 10:22:35,443 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-28 10:23:08,710 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2150, loss[loss=0.08466, simple_loss=0.1198, pruned_loss=0.01594, audio_tagging_loss=0.008818, over 15210.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08896, pruned_loss=0.01215, audio_tagging_loss=0.008525, over 3036068.18 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:23:26,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3461240.0, ans=0.125 2023-11-28 10:23:32,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-28 10:23:33,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-28 10:23:33,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.74 vs. limit=6.0 2023-11-28 10:23:33,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-28 10:23:40,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3461306.6666666665, ans=0.1 2023-11-28 10:23:43,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3461373.3333333335, ans=0.125 2023-11-28 10:23:48,066 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:23:48,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3461373.3333333335, ans=0.125 2023-11-28 10:24:07,074 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2200, loss[loss=0.08205, simple_loss=0.1143, pruned_loss=0.01831, audio_tagging_loss=0.006581, over 15338.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08874, pruned_loss=0.01197, audio_tagging_loss=0.008571, over 3039859.67 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:24:08,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.940e+01 9.417e+01 1.003e+02 1.474e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 10:24:26,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-28 10:24:32,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-28 10:24:35,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3461640.0, ans=0.0 2023-11-28 10:25:00,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2023-11-28 10:25:04,102 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2250, loss[loss=0.0658, simple_loss=0.0868, pruned_loss=0.01143, audio_tagging_loss=0.01097, over 15891.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08885, pruned_loss=0.01198, audio_tagging_loss=0.00864, over 3041937.73 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:25:07,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=12.0 2023-11-28 10:25:23,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-28 10:25:25,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3461906.6666666665, ans=0.125 2023-11-28 10:25:29,750 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-28 10:25:34,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-28 10:25:36,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3461973.3333333335, ans=0.1 2023-11-28 10:25:38,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462040.0, ans=0.1 2023-11-28 10:25:48,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3462040.0, ans=0.2 2023-11-28 10:25:52,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2023-11-28 10:26:02,940 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2300, loss[loss=0.07004, simple_loss=0.1007, pruned_loss=0.01069, audio_tagging_loss=0.008987, over 15337.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08969, pruned_loss=0.01219, audio_tagging_loss=0.008662, over 3040275.22 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:26:04,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.619e+01 8.792e+01 9.298e+01 1.006e+02 1.302e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:26:25,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-28 10:26:28,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-28 10:26:28,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-28 10:26:29,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-28 10:26:37,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-28 10:26:39,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-28 10:26:51,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3462440.0, ans=0.125 2023-11-28 10:26:55,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-28 10:26:56,140 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:26:58,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3462440.0, ans=0.125 2023-11-28 10:27:00,537 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2350, loss[loss=0.04808, simple_loss=0.05512, pruned_loss=0.008319, audio_tagging_loss=0.0122, over 15036.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08891, pruned_loss=0.01214, audio_tagging_loss=0.008793, over 3046617.51 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:27:06,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-28 10:27:10,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3462506.6666666665, ans=0.2 2023-11-28 10:27:20,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3462573.3333333335, ans=0.125 2023-11-28 10:27:25,730 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-28 10:27:26,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2023-11-28 10:27:29,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462640.0, ans=0.1 2023-11-28 10:27:34,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3462706.6666666665, ans=0.125 2023-11-28 10:27:42,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3462706.6666666665, ans=0.0 2023-11-28 10:27:45,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=12.0 2023-11-28 10:27:45,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.26 vs. limit=22.5 2023-11-28 10:27:55,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3462773.3333333335, ans=0.0 2023-11-28 10:27:59,272 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2400, loss[loss=0.08201, simple_loss=0.1193, pruned_loss=0.0152, audio_tagging_loss=0.007142, over 14171.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09023, pruned_loss=0.01231, audio_tagging_loss=0.008768, over 3040583.12 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:28:00,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.676e+01 9.385e+01 1.010e+02 1.342e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 10:28:03,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2023-11-28 10:28:15,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3462906.6666666665, ans=0.0 2023-11-28 10:28:25,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-28 10:28:30,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-28 10:28:54,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3463106.6666666665, ans=0.125 2023-11-28 10:28:58,278 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2450, loss[loss=0.07008, simple_loss=0.0987, pruned_loss=0.01418, audio_tagging_loss=0.006556, over 15065.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08998, pruned_loss=0.01218, audio_tagging_loss=0.008825, over 3036772.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:05,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3463173.3333333335, ans=0.125 2023-11-28 10:29:23,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-28 10:29:30,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3463306.6666666665, ans=0.0 2023-11-28 10:29:33,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3463373.3333333335, ans=0.125 2023-11-28 10:29:56,335 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2500, loss[loss=0.05546, simple_loss=0.06897, pruned_loss=0.01169, audio_tagging_loss=0.009291, over 14804.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08967, pruned_loss=0.01227, audio_tagging_loss=0.008914, over 3031600.78 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:57,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.648e+01 9.240e+01 1.001e+02 1.352e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 10:29:57,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3463506.6666666665, ans=0.2 2023-11-28 10:30:16,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-28 10:30:21,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-28 10:30:25,844 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:30:43,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=12.0 2023-11-28 10:30:54,578 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2550, loss[loss=0.06842, simple_loss=0.09506, pruned_loss=0.01273, audio_tagging_loss=0.008154, over 15926.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08931, pruned_loss=0.0122, audio_tagging_loss=0.008899, over 3035282.44 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:30:54,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3463840.0, ans=0.07 2023-11-28 10:31:17,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3463973.3333333335, ans=0.125 2023-11-28 10:31:19,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-28 10:31:25,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3463973.3333333335, ans=0.125 2023-11-28 10:31:29,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3464040.0, ans=0.0 2023-11-28 10:31:53,546 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2600, loss[loss=0.06851, simple_loss=0.09462, pruned_loss=0.0119, audio_tagging_loss=0.009305, over 15404.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08907, pruned_loss=0.01213, audio_tagging_loss=0.008774, over 3037086.08 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:31:56,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.673e+01 9.368e+01 9.896e+01 1.178e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 10:32:05,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3464240.0, ans=0.2 2023-11-28 10:32:08,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=22.5 2023-11-28 10:32:17,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3464306.6666666665, ans=0.0 2023-11-28 10:32:17,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-28 10:32:18,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-28 10:32:19,441 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-28 10:32:30,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-28 10:32:52,185 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2650, loss[loss=0.06099, simple_loss=0.08376, pruned_loss=0.01133, audio_tagging_loss=0.007785, over 14780.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08868, pruned_loss=0.01209, audio_tagging_loss=0.00874, over 3031600.80 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:33:09,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2023-11-28 10:33:13,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-28 10:33:17,823 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-28 10:33:31,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3464706.6666666665, ans=0.2 2023-11-28 10:33:35,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3464706.6666666665, ans=0.0 2023-11-28 10:33:50,929 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2700, loss[loss=0.08636, simple_loss=0.1267, pruned_loss=0.01662, audio_tagging_loss=0.006394, over 16138.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08934, pruned_loss=0.01224, audio_tagging_loss=0.008582, over 3034505.53 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:33:54,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 9.167e+01 9.683e+01 1.022e+02 1.162e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 10:34:16,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-28 10:34:23,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-28 10:34:43,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-11-28 10:34:43,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.39 vs. limit=22.5 2023-11-28 10:34:47,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3465173.3333333335, ans=10.0 2023-11-28 10:34:47,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2023-11-28 10:34:48,174 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2750, loss[loss=0.06643, simple_loss=0.08746, pruned_loss=0.01233, audio_tagging_loss=0.01036, over 15359.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08837, pruned_loss=0.01197, audio_tagging_loss=0.008626, over 3036035.14 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:34:57,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.84 vs. limit=10.0 2023-11-28 10:34:58,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465173.3333333335, ans=0.1 2023-11-28 10:35:00,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-28 10:35:14,292 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-28 10:35:21,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3465306.6666666665, ans=0.125 2023-11-28 10:35:24,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465373.3333333335, ans=0.1 2023-11-28 10:35:27,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3465373.3333333335, ans=0.125 2023-11-28 10:35:29,897 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:35:32,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-28 10:35:34,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3465440.0, ans=0.95 2023-11-28 10:35:42,907 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:35:47,360 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2800, loss[loss=0.04728, simple_loss=0.06359, pruned_loss=0.007345, audio_tagging_loss=0.008139, over 15152.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08793, pruned_loss=0.01202, audio_tagging_loss=0.008618, over 3037472.67 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:35:50,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.532e+01 9.536e+01 1.008e+02 1.642e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 10:35:56,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465506.6666666665, ans=0.1 2023-11-28 10:36:04,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2023-11-28 10:36:09,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3465640.0, ans=0.2 2023-11-28 10:36:12,952 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-28 10:36:25,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3465706.6666666665, ans=0.125 2023-11-28 10:36:45,207 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2850, loss[loss=0.06313, simple_loss=0.08306, pruned_loss=0.01136, audio_tagging_loss=0.01024, over 14415.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08823, pruned_loss=0.01212, audio_tagging_loss=0.008541, over 3041896.09 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:36:45,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3465840.0, ans=0.125 2023-11-28 10:36:46,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465840.0, ans=0.1 2023-11-28 10:37:04,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3465906.6666666665, ans=0.125 2023-11-28 10:37:11,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-28 10:37:11,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2023-11-28 10:37:22,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-11-28 10:37:28,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3466040.0, ans=0.125 2023-11-28 10:37:36,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3466106.6666666665, ans=0.125 2023-11-28 10:37:43,590 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2900, loss[loss=0.06696, simple_loss=0.08969, pruned_loss=0.01301, audio_tagging_loss=0.009112, over 15281.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08878, pruned_loss=0.01215, audio_tagging_loss=0.008558, over 3043089.68 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:37:46,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=3466173.3333333335, ans=12.0 2023-11-28 10:37:46,880 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.834e+01 9.612e+01 1.019e+02 1.318e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 10:38:08,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3466306.6666666665, ans=0.125 2023-11-28 10:38:09,131 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-28 10:38:13,903 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:38:40,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3466440.0, ans=0.0 2023-11-28 10:38:42,312 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2950, loss[loss=0.07557, simple_loss=0.1076, pruned_loss=0.01539, audio_tagging_loss=0.006396, over 16108.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08919, pruned_loss=0.01218, audio_tagging_loss=0.008626, over 3040681.36 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:38:45,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3466506.6666666665, ans=0.2 2023-11-28 10:38:53,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3466573.3333333335, ans=0.0 2023-11-28 10:39:08,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-28 10:39:08,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-28 10:39:23,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3466706.6666666665, ans=0.0 2023-11-28 10:39:32,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3466773.3333333335, ans=0.025 2023-11-28 10:39:33,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3466773.3333333335, ans=0.125 2023-11-28 10:39:42,311 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3000, loss[loss=0.05267, simple_loss=0.06903, pruned_loss=0.01017, audio_tagging_loss=0.007989, over 15404.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08988, pruned_loss=0.01232, audio_tagging_loss=0.00867, over 3049300.28 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:39:42,311 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 10:40:18,159 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05741, simple_loss=0.05054, pruned_loss=0.005252, audio_tagging_loss=0.02689, over 4681554.00 frames. 2023-11-28 10:40:18,160 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 10:40:21,404 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.904e+01 9.559e+01 1.030e+02 1.233e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:40:23,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-28 10:40:25,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3466840.0, ans=0.125 2023-11-28 10:40:42,555 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-28 10:41:07,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3467106.6666666665, ans=0.125 2023-11-28 10:41:14,814 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:41:15,704 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3050, loss[loss=0.07566, simple_loss=0.1013, pruned_loss=0.01522, audio_tagging_loss=0.009804, over 14749.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08952, pruned_loss=0.01224, audio_tagging_loss=0.008823, over 3044966.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:41:31,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3467240.0, ans=0.0 2023-11-28 10:41:41,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-28 10:41:42,808 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:41:45,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3467306.6666666665, ans=0.0 2023-11-28 10:41:53,534 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:42:13,295 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3100, loss[loss=0.05471, simple_loss=0.06949, pruned_loss=0.01058, audio_tagging_loss=0.009381, over 14691.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09052, pruned_loss=0.01232, audio_tagging_loss=0.008754, over 3045929.59 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:42:16,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.845e+01 9.349e+01 1.011e+02 1.262e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 10:42:32,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3467573.3333333335, ans=0.125 2023-11-28 10:42:33,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-11-28 10:42:40,029 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-28 10:42:50,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3467706.6666666665, ans=0.2 2023-11-28 10:42:50,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3467706.6666666665, ans=0.125 2023-11-28 10:42:59,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3467773.3333333335, ans=0.1 2023-11-28 10:43:11,808 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3150, loss[loss=0.0597, simple_loss=0.07717, pruned_loss=0.01039, audio_tagging_loss=0.01073, over 14889.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09055, pruned_loss=0.01239, audio_tagging_loss=0.008762, over 3036895.57 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:43:28,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3467906.6666666665, ans=0.07 2023-11-28 10:43:37,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-28 10:43:37,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3467973.3333333335, ans=0.125 2023-11-28 10:44:10,808 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3200, loss[loss=0.07604, simple_loss=0.09773, pruned_loss=0.0191, audio_tagging_loss=0.008073, over 15202.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09047, pruned_loss=0.01239, audio_tagging_loss=0.008824, over 3043906.13 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:44:13,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3468173.3333333335, ans=0.035 2023-11-28 10:44:14,053 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.853e+01 9.488e+01 1.043e+02 1.212e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 10:44:32,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-28 10:44:35,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-28 10:44:53,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3468373.3333333335, ans=0.125 2023-11-28 10:45:07,163 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3250, loss[loss=0.08653, simple_loss=0.1162, pruned_loss=0.02186, audio_tagging_loss=0.006543, over 14896.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09143, pruned_loss=0.01264, audio_tagging_loss=0.008776, over 3047861.79 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:45:22,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3468573.3333333335, ans=0.125 2023-11-28 10:45:24,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3468573.3333333335, ans=0.0 2023-11-28 10:45:29,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-28 10:45:33,415 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-28 10:45:34,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-28 10:45:37,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3468640.0, ans=0.125 2023-11-28 10:45:54,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3468773.3333333335, ans=0.125 2023-11-28 10:46:05,078 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3300, loss[loss=0.06272, simple_loss=0.08458, pruned_loss=0.01227, audio_tagging_loss=0.008157, over 15344.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09035, pruned_loss=0.01247, audio_tagging_loss=0.008923, over 3054839.04 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:46:08,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.967e+01 9.560e+01 1.010e+02 1.793e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:46:17,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3468906.6666666665, ans=0.2 2023-11-28 10:46:22,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3468906.6666666665, ans=0.0 2023-11-28 10:46:24,323 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:46:29,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3468973.3333333335, ans=0.125 2023-11-28 10:46:30,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-28 10:46:49,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3469040.0, ans=0.125 2023-11-28 10:46:55,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3469106.6666666665, ans=0.125 2023-11-28 10:47:03,677 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3350, loss[loss=0.06318, simple_loss=0.0911, pruned_loss=0.01061, audio_tagging_loss=0.007029, over 15838.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09051, pruned_loss=0.01236, audio_tagging_loss=0.008801, over 3051990.59 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:47:05,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3469173.3333333335, ans=0.125 2023-11-28 10:47:20,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3469240.0, ans=0.1 2023-11-28 10:47:25,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3469306.6666666665, ans=0.125 2023-11-28 10:47:28,680 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-28 10:47:36,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3469306.6666666665, ans=0.125 2023-11-28 10:48:01,329 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3400, loss[loss=0.07245, simple_loss=0.09916, pruned_loss=0.01518, audio_tagging_loss=0.007684, over 15914.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09028, pruned_loss=0.01237, audio_tagging_loss=0.00875, over 3051599.07 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:48:05,753 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.926e+01 9.389e+01 1.002e+02 1.280e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 10:48:27,272 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-28 10:48:51,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3469773.3333333335, ans=0.125 2023-11-28 10:48:59,570 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3450, loss[loss=0.06844, simple_loss=0.09273, pruned_loss=0.01352, audio_tagging_loss=0.008554, over 16857.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09098, pruned_loss=0.01247, audio_tagging_loss=0.008682, over 3055428.90 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:49:24,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2023-11-28 10:49:25,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-28 10:49:45,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3470106.6666666665, ans=0.0 2023-11-28 10:49:49,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3470106.6666666665, ans=0.125 2023-11-28 10:49:58,023 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3500, loss[loss=0.0327, simple_loss=0.04025, pruned_loss=0.004016, audio_tagging_loss=0.00856, over 14254.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08993, pruned_loss=0.01227, audio_tagging_loss=0.008628, over 3045977.17 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:50:02,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.047e+01 9.689e+01 1.031e+02 1.305e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 10:50:23,618 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-28 10:50:29,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3470306.6666666665, ans=0.0 2023-11-28 10:50:30,284 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:50:56,673 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3550, loss[loss=0.0588, simple_loss=0.08127, pruned_loss=0.01057, audio_tagging_loss=0.007597, over 14016.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09037, pruned_loss=0.01239, audio_tagging_loss=0.008479, over 3048343.53 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:50:57,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3470506.6666666665, ans=0.0 2023-11-28 10:50:59,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-11-28 10:51:04,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3470506.6666666665, ans=0.2 2023-11-28 10:51:10,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3470573.3333333335, ans=0.95 2023-11-28 10:51:22,611 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-28 10:51:42,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3470773.3333333335, ans=0.125 2023-11-28 10:51:46,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3470773.3333333335, ans=0.95 2023-11-28 10:51:55,126 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3600, loss[loss=0.07273, simple_loss=0.1002, pruned_loss=0.01636, audio_tagging_loss=0.006286, over 14045.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08928, pruned_loss=0.01229, audio_tagging_loss=0.008523, over 3046780.35 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:51:57,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3470840.0, ans=0.1 2023-11-28 10:52:00,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.557e+01 8.694e+01 9.447e+01 1.046e+02 1.297e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:52:02,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3470840.0, ans=0.125 2023-11-28 10:52:21,629 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-28 10:52:39,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3471040.0, ans=0.125 2023-11-28 10:52:54,240 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3650, loss[loss=0.05066, simple_loss=0.06362, pruned_loss=0.008954, audio_tagging_loss=0.009897, over 15486.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08938, pruned_loss=0.01228, audio_tagging_loss=0.008484, over 3044525.46 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:53:05,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3471240.0, ans=0.0 2023-11-28 10:53:19,734 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-28 10:53:39,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3471373.3333333335, ans=0.2 2023-11-28 10:53:52,249 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3700, loss[loss=0.06233, simple_loss=0.08529, pruned_loss=0.009585, audio_tagging_loss=0.01011, over 15305.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08933, pruned_loss=0.01223, audio_tagging_loss=0.008485, over 3048883.59 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:53:59,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.858e+01 9.302e+01 9.977e+01 1.303e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:54:13,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3471573.3333333335, ans=0.125 2023-11-28 10:54:19,220 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-28 10:54:28,350 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:54:50,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3471840.0, ans=0.015 2023-11-28 10:54:51,696 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3750, loss[loss=0.06442, simple_loss=0.08119, pruned_loss=0.01195, audio_tagging_loss=0.01188, over 14640.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01231, audio_tagging_loss=0.00853, over 3048509.23 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:10,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3471906.6666666665, ans=0.0 2023-11-28 10:55:17,464 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-28 10:55:21,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3471973.3333333335, ans=0.1 2023-11-28 10:55:35,444 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:55:51,424 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3800, loss[loss=0.06707, simple_loss=0.09293, pruned_loss=0.01139, audio_tagging_loss=0.009214, over 14987.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08981, pruned_loss=0.01237, audio_tagging_loss=0.008636, over 3042583.81 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:55,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2023-11-28 10:55:58,018 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.010e+01 9.587e+01 1.023e+02 1.351e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 10:56:07,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3472240.0, ans=0.09899494936611666 2023-11-28 10:56:16,935 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-28 10:56:23,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3472306.6666666665, ans=0.1 2023-11-28 10:56:34,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-28 10:56:41,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3472440.0, ans=0.2 2023-11-28 10:56:41,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-28 10:56:42,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3472440.0, ans=0.05 2023-11-28 10:56:43,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3472440.0, ans=0.5 2023-11-28 10:56:49,676 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3850, loss[loss=0.06653, simple_loss=0.08347, pruned_loss=0.01613, audio_tagging_loss=0.008662, over 14754.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08991, pruned_loss=0.01241, audio_tagging_loss=0.008667, over 3042878.52 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:56:51,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3472506.6666666665, ans=0.0 2023-11-28 10:57:15,571 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-28 10:57:19,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-28 10:57:30,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3472706.6666666665, ans=0.0 2023-11-28 10:57:44,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472773.3333333335, ans=0.1 2023-11-28 10:57:48,651 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3900, loss[loss=0.07797, simple_loss=0.1036, pruned_loss=0.01878, audio_tagging_loss=0.007404, over 13773.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09029, pruned_loss=0.01241, audio_tagging_loss=0.008709, over 3041952.20 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:57:56,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.789e+01 9.361e+01 1.021e+02 3.606e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-28 10:58:10,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3472906.6666666665, ans=0.2 2023-11-28 10:58:11,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3472973.3333333335, ans=0.02 2023-11-28 10:58:14,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-28 10:58:43,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-28 10:58:48,205 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3950, loss[loss=0.04507, simple_loss=0.05607, pruned_loss=0.005754, audio_tagging_loss=0.01128, over 14935.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09023, pruned_loss=0.01255, audio_tagging_loss=0.008738, over 3040458.42 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:58:50,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3473173.3333333335, ans=0.125 2023-11-28 10:58:54,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3473173.3333333335, ans=0.125 2023-11-28 10:59:12,829 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-28 10:59:16,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3473306.6666666665, ans=0.125 2023-11-28 10:59:24,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-11-28 10:59:33,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3473373.3333333335, ans=15.0 2023-11-28 10:59:46,244 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4000, loss[loss=0.07415, simple_loss=0.1011, pruned_loss=0.01484, audio_tagging_loss=0.00877, over 15353.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09051, pruned_loss=0.01251, audio_tagging_loss=0.008797, over 3036612.36 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:59:52,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.959e+01 9.483e+01 1.017e+02 1.499e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 10:59:53,214 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:59:53,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-28 10:59:58,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3473573.3333333335, ans=0.1 2023-11-28 10:59:58,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3473573.3333333335, ans=0.0 2023-11-28 11:00:00,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3473573.3333333335, ans=0.125 2023-11-28 11:00:12,041 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-28 11:00:42,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-28 11:00:44,035 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4050, loss[loss=0.05161, simple_loss=0.06615, pruned_loss=0.007614, audio_tagging_loss=0.01092, over 14149.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09022, pruned_loss=0.01249, audio_tagging_loss=0.008824, over 3030440.68 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:00:50,376 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:00:50,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-28 11:00:56,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3473906.6666666665, ans=0.05 2023-11-28 11:01:10,336 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-28 11:01:31,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3474106.6666666665, ans=0.125 2023-11-28 11:01:39,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3474106.6666666665, ans=0.125 2023-11-28 11:01:42,741 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4100, loss[loss=0.07508, simple_loss=0.09324, pruned_loss=0.01743, audio_tagging_loss=0.01102, over 15673.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09071, pruned_loss=0.01254, audio_tagging_loss=0.008839, over 3038152.56 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:01:43,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3474173.3333333335, ans=0.125 2023-11-28 11:01:51,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.779e+01 9.580e+01 1.037e+02 1.315e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 11:02:08,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-28 11:02:09,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-28 11:02:19,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3474373.3333333335, ans=0.125 2023-11-28 11:02:38,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3474440.0, ans=0.0 2023-11-28 11:02:41,699 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4150, loss[loss=0.0731, simple_loss=0.09744, pruned_loss=0.0155, audio_tagging_loss=0.008874, over 14799.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09071, pruned_loss=0.01255, audio_tagging_loss=0.008754, over 3035065.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:02:46,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2023-11-28 11:02:55,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474573.3333333335, ans=0.1 2023-11-28 11:02:55,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3474573.3333333335, ans=0.125 2023-11-28 11:02:56,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3474573.3333333335, ans=0.125 2023-11-28 11:03:02,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3474573.3333333335, ans=0.125 2023-11-28 11:03:07,959 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-28 11:03:15,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3474640.0, ans=0.125 2023-11-28 11:03:20,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3474706.6666666665, ans=0.0 2023-11-28 11:03:28,255 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:03:40,537 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4200, loss[loss=0.07858, simple_loss=0.1092, pruned_loss=0.01496, audio_tagging_loss=0.00903, over 15643.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09049, pruned_loss=0.0124, audio_tagging_loss=0.00867, over 3038795.27 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:03:42,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3474840.0, ans=0.0 2023-11-28 11:03:42,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3474840.0, ans=0.125 2023-11-28 11:03:44,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3474840.0, ans=0.0 2023-11-28 11:03:48,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3474840.0, ans=0.2 2023-11-28 11:03:49,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.843e+01 9.445e+01 1.017e+02 1.271e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 11:04:07,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-28 11:04:09,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3474973.3333333335, ans=0.0 2023-11-28 11:04:10,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-28 11:04:21,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-28 11:04:39,648 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4250, loss[loss=0.06838, simple_loss=0.09048, pruned_loss=0.01585, audio_tagging_loss=0.007289, over 14683.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09091, pruned_loss=0.01248, audio_tagging_loss=0.008551, over 3044211.03 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:04:59,324 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:05:05,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-28 11:05:07,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3475306.6666666665, ans=22.5 2023-11-28 11:05:11,494 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:05:22,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3475373.3333333335, ans=0.04949747468305833 2023-11-28 11:05:26,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3475440.0, ans=0.125 2023-11-28 11:05:39,412 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4300, loss[loss=0.06169, simple_loss=0.08274, pruned_loss=0.01194, audio_tagging_loss=0.008379, over 15370.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09099, pruned_loss=0.01239, audio_tagging_loss=0.008561, over 3045033.15 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:05:47,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.879e+01 9.468e+01 1.032e+02 1.370e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 11:06:04,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-28 11:06:27,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2023-11-28 11:06:27,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-28 11:06:34,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3475773.3333333335, ans=0.1 2023-11-28 11:06:37,547 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4350, loss[loss=0.07841, simple_loss=0.1052, pruned_loss=0.01729, audio_tagging_loss=0.008532, over 14432.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09198, pruned_loss=0.01247, audio_tagging_loss=0.00849, over 3042333.26 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:06:43,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3475840.0, ans=0.125 2023-11-28 11:07:00,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3475973.3333333335, ans=0.125 2023-11-28 11:07:04,018 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-28 11:07:11,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3475973.3333333335, ans=0.125 2023-11-28 11:07:13,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3476040.0, ans=0.0 2023-11-28 11:07:20,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3476040.0, ans=0.0 2023-11-28 11:07:23,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3476106.6666666665, ans=0.125 2023-11-28 11:07:24,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3476106.6666666665, ans=0.0 2023-11-28 11:07:36,268 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4400, loss[loss=0.04877, simple_loss=0.06641, pruned_loss=0.006965, audio_tagging_loss=0.008602, over 13999.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09151, pruned_loss=0.01243, audio_tagging_loss=0.008552, over 3051029.37 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:07:37,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3476173.3333333335, ans=0.125 2023-11-28 11:07:38,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3476173.3333333335, ans=0.0 2023-11-28 11:07:44,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.068e+01 9.728e+01 1.034e+02 1.377e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 11:07:56,634 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:08:02,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-28 11:08:05,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3476306.6666666665, ans=0.0 2023-11-28 11:08:22,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3476440.0, ans=0.125 2023-11-28 11:08:35,687 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4450, loss[loss=0.07031, simple_loss=0.09409, pruned_loss=0.01534, audio_tagging_loss=0.007925, over 15246.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09076, pruned_loss=0.01245, audio_tagging_loss=0.008623, over 3052987.49 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:08:37,097 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:08:50,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=22.5 2023-11-28 11:09:00,793 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-28 11:09:10,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3476706.6666666665, ans=0.0 2023-11-28 11:09:24,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3476773.3333333335, ans=0.125 2023-11-28 11:09:31,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3476773.3333333335, ans=0.125 2023-11-28 11:09:33,523 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4500, loss[loss=0.06766, simple_loss=0.08567, pruned_loss=0.01404, audio_tagging_loss=0.01079, over 15546.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09037, pruned_loss=0.01239, audio_tagging_loss=0.008573, over 3060160.59 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:09:41,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.818e+01 9.367e+01 9.979e+01 1.467e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 11:09:53,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3476906.6666666665, ans=0.125 2023-11-28 11:09:59,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-28 11:10:19,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-28 11:10:32,143 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4550, loss[loss=0.1007, simple_loss=0.1391, pruned_loss=0.02458, audio_tagging_loss=0.006521, over 14827.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09045, pruned_loss=0.01236, audio_tagging_loss=0.008597, over 3048372.46 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:10:48,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3477240.0, ans=0.09899494936611666 2023-11-28 11:10:58,553 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-28 11:11:04,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-11-28 11:11:07,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3477373.3333333335, ans=0.125 2023-11-28 11:11:21,733 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:11:23,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-28 11:11:25,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2023-11-28 11:11:31,618 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4600, loss[loss=0.07205, simple_loss=0.09434, pruned_loss=0.01563, audio_tagging_loss=0.009253, over 15030.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09028, pruned_loss=0.01238, audio_tagging_loss=0.008678, over 3040664.67 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:11:39,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.873e+01 9.292e+01 1.017e+02 1.163e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 11:11:43,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3477573.3333333335, ans=0.0 2023-11-28 11:11:56,653 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-28 11:12:06,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-28 11:12:20,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3477773.3333333335, ans=0.0 2023-11-28 11:12:27,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=3477773.3333333335, ans=12.0 2023-11-28 11:12:30,114 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4650, loss[loss=0.0513, simple_loss=0.06989, pruned_loss=0.007446, audio_tagging_loss=0.008914, over 14362.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08949, pruned_loss=0.01214, audio_tagging_loss=0.008733, over 3043937.33 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:12:33,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3477840.0, ans=0.1 2023-11-28 11:12:39,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3477840.0, ans=0.125 2023-11-28 11:12:46,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3477906.6666666665, ans=0.125 2023-11-28 11:12:55,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-28 11:13:05,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3478040.0, ans=0.07 2023-11-28 11:13:12,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2023-11-28 11:13:21,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3478106.6666666665, ans=0.07 2023-11-28 11:13:28,660 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4700, loss[loss=0.06392, simple_loss=0.08871, pruned_loss=0.01162, audio_tagging_loss=0.007948, over 16119.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09006, pruned_loss=0.01236, audio_tagging_loss=0.008858, over 3044446.23 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:13:36,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.974e+01 9.921e+01 1.076e+02 1.441e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 11:13:55,075 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-28 11:14:14,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3478440.0, ans=0.0 2023-11-28 11:14:27,482 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4750, loss[loss=0.05182, simple_loss=0.07558, pruned_loss=0.004792, audio_tagging_loss=0.009234, over 15566.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08942, pruned_loss=0.01218, audio_tagging_loss=0.008925, over 3044364.54 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:14:46,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3478573.3333333335, ans=0.0 2023-11-28 11:14:46,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2023-11-28 11:14:52,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-28 11:14:57,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3478640.0, ans=0.125 2023-11-28 11:15:03,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478706.6666666665, ans=0.1 2023-11-28 11:15:08,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478706.6666666665, ans=0.1 2023-11-28 11:15:10,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478706.6666666665, ans=0.125 2023-11-28 11:15:25,677 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4800, loss[loss=0.08196, simple_loss=0.1108, pruned_loss=0.01757, audio_tagging_loss=0.009001, over 14182.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08902, pruned_loss=0.01209, audio_tagging_loss=0.009066, over 3045384.10 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:15:29,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478840.0, ans=0.1 2023-11-28 11:15:34,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.828e+01 9.577e+01 1.068e+02 1.342e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:15:34,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478840.0, ans=0.125 2023-11-28 11:15:51,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-28 11:16:01,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3479040.0, ans=0.2 2023-11-28 11:16:07,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3479040.0, ans=0.125 2023-11-28 11:16:13,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3479106.6666666665, ans=0.125 2023-11-28 11:16:17,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3479106.6666666665, ans=0.125 2023-11-28 11:16:23,914 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4850, loss[loss=0.07299, simple_loss=0.1045, pruned_loss=0.01447, audio_tagging_loss=0.006274, over 15656.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08888, pruned_loss=0.01215, audio_tagging_loss=0.009102, over 3042343.97 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:16:34,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3479240.0, ans=0.125 2023-11-28 11:16:35,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-11-28 11:16:42,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3479240.0, ans=0.125 2023-11-28 11:16:49,855 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-28 11:17:22,681 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4900, loss[loss=0.0828, simple_loss=0.1161, pruned_loss=0.01839, audio_tagging_loss=0.00637, over 15685.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08934, pruned_loss=0.01212, audio_tagging_loss=0.009018, over 3037224.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:17:26,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3479506.6666666665, ans=0.0 2023-11-28 11:17:32,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.706e+01 9.491e+01 1.021e+02 1.931e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 11:17:34,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3479573.3333333335, ans=0.125 2023-11-28 11:17:48,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3479640.0, ans=0.0 2023-11-28 11:17:48,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-11-28 11:17:48,975 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-28 11:18:04,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3479706.6666666665, ans=0.1 2023-11-28 11:18:17,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-11-28 11:18:21,529 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4950, loss[loss=0.0548, simple_loss=0.08264, pruned_loss=0.007417, audio_tagging_loss=0.006064, over 16436.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08984, pruned_loss=0.01221, audio_tagging_loss=0.008807, over 3037354.46 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:18:40,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-28 11:18:47,260 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-28 11:19:09,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-28 11:19:14,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3480106.6666666665, ans=0.125 2023-11-28 11:19:20,161 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5000, loss[loss=0.06669, simple_loss=0.09628, pruned_loss=0.01206, audio_tagging_loss=0.006487, over 14175.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0896, pruned_loss=0.01208, audio_tagging_loss=0.00865, over 3037302.03 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:19:24,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-28 11:19:29,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.777e+01 9.263e+01 9.841e+01 1.147e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 11:19:46,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-28 11:19:54,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-28 11:19:58,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3480373.3333333335, ans=0.125 2023-11-28 11:20:13,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3480440.0, ans=0.0 2023-11-28 11:20:14,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3480440.0, ans=0.125 2023-11-28 11:20:18,901 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5050, loss[loss=0.05889, simple_loss=0.06603, pruned_loss=0.01566, audio_tagging_loss=0.01022, over 14770.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08932, pruned_loss=0.01201, audio_tagging_loss=0.00858, over 3041019.53 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:20:38,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-28 11:20:44,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-28 11:20:44,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3480640.0, ans=0.0 2023-11-28 11:20:53,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-28 11:21:11,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3480773.3333333335, ans=0.0 2023-11-28 11:21:17,556 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5100, loss[loss=0.08456, simple_loss=0.1226, pruned_loss=0.01682, audio_tagging_loss=0.006419, over 15801.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09057, pruned_loss=0.0121, audio_tagging_loss=0.008505, over 3042571.90 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:21:26,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.858e+01 9.488e+01 1.012e+02 1.214e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:21:26,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3480840.0, ans=0.05 2023-11-28 11:21:34,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3480906.6666666665, ans=0.125 2023-11-28 11:21:43,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-28 11:21:51,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-28 11:22:15,720 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5150, loss[loss=0.06328, simple_loss=0.08394, pruned_loss=0.008991, audio_tagging_loss=0.01232, over 14445.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09058, pruned_loss=0.01204, audio_tagging_loss=0.008535, over 3041225.10 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:22:27,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3481240.0, ans=0.125 2023-11-28 11:22:28,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3481240.0, ans=0.2 2023-11-28 11:22:42,077 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-28 11:23:08,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3481440.0, ans=0.125 2023-11-28 11:23:12,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-28 11:23:14,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2023-11-28 11:23:14,771 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5200, loss[loss=0.0604, simple_loss=0.07751, pruned_loss=0.01185, audio_tagging_loss=0.009797, over 15479.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09051, pruned_loss=0.01206, audio_tagging_loss=0.008591, over 3038877.45 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:23:24,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.751e+01 9.601e+01 1.026e+02 1.242e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 11:23:39,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-28 11:24:05,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-28 11:24:05,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-28 11:24:12,205 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5250, loss[loss=0.05717, simple_loss=0.07733, pruned_loss=0.008326, audio_tagging_loss=0.01018, over 15250.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09006, pruned_loss=0.01217, audio_tagging_loss=0.008636, over 3033894.65 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:24:37,369 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-28 11:24:50,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3482040.0, ans=0.125 2023-11-28 11:25:01,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-28 11:25:07,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3482106.6666666665, ans=0.125 2023-11-28 11:25:09,471 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5300, loss[loss=0.06999, simple_loss=0.1015, pruned_loss=0.01228, audio_tagging_loss=0.006934, over 15082.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08993, pruned_loss=0.01224, audio_tagging_loss=0.008629, over 3038031.31 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:25:19,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.992e+01 9.491e+01 1.033e+02 1.599e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:25:35,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-28 11:26:04,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3482440.0, ans=0.2 2023-11-28 11:26:07,640 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5350, loss[loss=0.06639, simple_loss=0.08661, pruned_loss=0.0141, audio_tagging_loss=0.008984, over 15178.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08985, pruned_loss=0.01216, audio_tagging_loss=0.008641, over 3036692.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:26:15,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-28 11:26:19,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-28 11:26:33,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-28 11:26:35,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-28 11:26:43,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482706.6666666665, ans=0.1 2023-11-28 11:26:49,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-28 11:26:53,856 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:27:07,429 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5400, loss[loss=0.05996, simple_loss=0.08749, pruned_loss=0.008046, audio_tagging_loss=0.008163, over 16972.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08955, pruned_loss=0.01211, audio_tagging_loss=0.008741, over 3035361.20 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:27:17,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.830e+01 9.403e+01 1.046e+02 1.380e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:27:19,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3482906.6666666665, ans=0.0 2023-11-28 11:27:31,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-28 11:27:35,934 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:27:41,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3483040.0, ans=0.1 2023-11-28 11:27:47,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3483040.0, ans=0.2 2023-11-28 11:28:05,734 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5450, loss[loss=0.06659, simple_loss=0.08725, pruned_loss=0.01363, audio_tagging_loss=0.009327, over 14906.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08942, pruned_loss=0.01234, audio_tagging_loss=0.008834, over 3037311.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:28:32,297 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-28 11:28:50,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-28 11:28:57,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3483440.0, ans=0.125 2023-11-28 11:28:59,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-28 11:29:04,403 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5500, loss[loss=0.06006, simple_loss=0.08994, pruned_loss=0.006898, audio_tagging_loss=0.008195, over 15356.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08894, pruned_loss=0.01212, audio_tagging_loss=0.008773, over 3037394.34 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:29:05,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3483506.6666666665, ans=0.2 2023-11-28 11:29:11,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3483506.6666666665, ans=15.0 2023-11-28 11:29:15,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.610e+01 9.341e+01 1.002e+02 1.177e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 11:29:15,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2023-11-28 11:29:22,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-28 11:29:30,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-28 11:29:31,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2023-11-28 11:29:41,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3483706.6666666665, ans=0.0 2023-11-28 11:29:41,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3483706.6666666665, ans=0.0 2023-11-28 11:29:43,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2023-11-28 11:29:43,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3483706.6666666665, ans=0.125 2023-11-28 11:29:46,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=22.5 2023-11-28 11:29:47,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3483706.6666666665, ans=0.0 2023-11-28 11:29:57,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3483773.3333333335, ans=0.1 2023-11-28 11:30:04,936 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5550, loss[loss=0.07088, simple_loss=0.09936, pruned_loss=0.01414, audio_tagging_loss=0.007064, over 15843.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09026, pruned_loss=0.0124, audio_tagging_loss=0.008822, over 3042252.54 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:30:07,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3483840.0, ans=0.0 2023-11-28 11:30:24,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3483906.6666666665, ans=0.1 2023-11-28 11:30:28,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3483973.3333333335, ans=0.125 2023-11-28 11:30:29,944 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-28 11:31:04,133 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5600, loss[loss=0.08223, simple_loss=0.1163, pruned_loss=0.01605, audio_tagging_loss=0.008022, over 16141.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09022, pruned_loss=0.01231, audio_tagging_loss=0.00894, over 3049193.80 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:31:14,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.030e+01 9.835e+01 1.064e+02 3.078e+02, threshold=1.967e+02, percent-clipped=1.0 2023-11-28 11:31:17,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3484240.0, ans=0.125 2023-11-28 11:31:26,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3484306.6666666665, ans=0.2 2023-11-28 11:31:29,387 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-28 11:31:30,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3484306.6666666665, ans=0.1 2023-11-28 11:31:30,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3484306.6666666665, ans=0.2 2023-11-28 11:31:45,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3484373.3333333335, ans=0.125 2023-11-28 11:31:51,247 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:32:02,662 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5650, loss[loss=0.05112, simple_loss=0.0684, pruned_loss=0.007567, audio_tagging_loss=0.009352, over 15128.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0893, pruned_loss=0.01216, audio_tagging_loss=0.008983, over 3049132.31 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:32:05,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3484506.6666666665, ans=0.125 2023-11-28 11:32:30,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-28 11:32:42,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-28 11:32:52,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3484773.3333333335, ans=0.0 2023-11-28 11:33:02,818 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5700, loss[loss=0.0693, simple_loss=0.1005, pruned_loss=0.01299, audio_tagging_loss=0.006063, over 15350.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08867, pruned_loss=0.01194, audio_tagging_loss=0.008917, over 3050986.98 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:33:03,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3484840.0, ans=0.05 2023-11-28 11:33:05,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3484840.0, ans=0.1 2023-11-28 11:33:15,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.782e+01 9.296e+01 1.023e+02 1.172e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 11:33:19,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3484906.6666666665, ans=0.0 2023-11-28 11:33:27,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3484973.3333333335, ans=0.125 2023-11-28 11:33:28,851 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-28 11:33:32,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3484973.3333333335, ans=0.0 2023-11-28 11:33:33,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3484973.3333333335, ans=0.0 2023-11-28 11:33:33,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3484973.3333333335, ans=0.09899494936611666 2023-11-28 11:33:59,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-28 11:34:00,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-28 11:34:02,527 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5750, loss[loss=0.07782, simple_loss=0.1122, pruned_loss=0.01424, audio_tagging_loss=0.007484, over 15597.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08791, pruned_loss=0.01171, audio_tagging_loss=0.008876, over 3045481.56 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:34:08,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3485173.3333333335, ans=0.125 2023-11-28 11:34:28,118 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-28 11:34:38,580 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:34:50,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3485440.0, ans=0.5 2023-11-28 11:34:59,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2023-11-28 11:35:01,870 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5800, loss[loss=0.07957, simple_loss=0.107, pruned_loss=0.01909, audio_tagging_loss=0.006985, over 15039.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08815, pruned_loss=0.0116, audio_tagging_loss=0.008755, over 3044237.68 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:35:03,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-28 11:35:13,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.794e+01 9.521e+01 1.033e+02 1.295e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 11:35:19,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-28 11:35:28,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-28 11:35:56,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3485773.3333333335, ans=0.125 2023-11-28 11:36:00,888 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5850, loss[loss=0.07053, simple_loss=0.09632, pruned_loss=0.01466, audio_tagging_loss=0.007704, over 14803.00 frames. ], tot_loss[loss=0.06363, simple_loss=0.08678, pruned_loss=0.01148, audio_tagging_loss=0.00876, over 3031040.48 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:36:03,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3485840.0, ans=0.09899494936611666 2023-11-28 11:36:04,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3485840.0, ans=0.2 2023-11-28 11:36:09,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3485840.0, ans=0.0 2023-11-28 11:36:11,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3485906.6666666665, ans=0.0 2023-11-28 11:36:13,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3485906.6666666665, ans=0.1 2023-11-28 11:36:26,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-28 11:36:27,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-28 11:36:29,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3485973.3333333335, ans=0.2 2023-11-28 11:36:58,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-28 11:36:59,223 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5900, loss[loss=0.06147, simple_loss=0.08806, pruned_loss=0.0102, audio_tagging_loss=0.007239, over 15032.00 frames. ], tot_loss[loss=0.06414, simple_loss=0.08744, pruned_loss=0.01165, audio_tagging_loss=0.008776, over 3028217.20 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:37:11,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.935e+01 9.645e+01 1.023e+02 1.416e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 11:37:18,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3486240.0, ans=0.2 2023-11-28 11:37:25,645 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-28 11:37:39,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3486373.3333333335, ans=10.0 2023-11-28 11:37:40,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3486373.3333333335, ans=0.025 2023-11-28 11:37:40,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3486373.3333333335, ans=0.0 2023-11-28 11:37:58,746 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5950, loss[loss=0.0591, simple_loss=0.08213, pruned_loss=0.01119, audio_tagging_loss=0.006839, over 15589.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0886, pruned_loss=0.01174, audio_tagging_loss=0.008616, over 3032872.37 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:38:05,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3486506.6666666665, ans=0.125 2023-11-28 11:38:12,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2023-11-28 11:38:24,974 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-28 11:38:35,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3486706.6666666665, ans=0.1 2023-11-28 11:38:41,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3486706.6666666665, ans=0.125 2023-11-28 11:38:49,583 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:38:57,774 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6000, loss[loss=0.05956, simple_loss=0.07955, pruned_loss=0.009202, audio_tagging_loss=0.01058, over 15233.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08826, pruned_loss=0.01168, audio_tagging_loss=0.008676, over 3036557.18 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:38:57,775 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 11:39:21,653 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4653, 3.8901, 3.1073, 3.8808], device='cuda:3') 2023-11-28 11:39:22,752 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4550, 3.7638, 4.3682, 3.5204], device='cuda:3') 2023-11-28 11:39:33,660 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05792, simple_loss=0.0506, pruned_loss=0.005293, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 11:39:33,661 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 11:39:45,289 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.849e+01 9.422e+01 1.008e+02 1.234e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 11:39:59,487 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-28 11:40:07,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-28 11:40:09,293 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:40:15,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3487040.0, ans=0.125 2023-11-28 11:40:20,203 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:40:31,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3487173.3333333335, ans=0.05 2023-11-28 11:40:31,915 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6050, loss[loss=0.06865, simple_loss=0.08925, pruned_loss=0.01244, audio_tagging_loss=0.01159, over 15483.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08887, pruned_loss=0.01187, audio_tagging_loss=0.008656, over 3038703.16 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:40:48,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-11-28 11:40:55,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3487306.6666666665, ans=0.125 2023-11-28 11:40:58,466 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-28 11:41:00,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2023-11-28 11:41:08,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=22.5 2023-11-28 11:41:25,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-28 11:41:31,062 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6100, loss[loss=0.07099, simple_loss=0.1035, pruned_loss=0.01183, audio_tagging_loss=0.007411, over 14536.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08845, pruned_loss=0.0118, audio_tagging_loss=0.008618, over 3038763.76 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:41:37,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3487506.6666666665, ans=0.125 2023-11-28 11:41:38,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-28 11:41:43,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.905e+01 9.501e+01 1.004e+02 1.216e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 11:41:48,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487573.3333333335, ans=0.1 2023-11-28 11:41:56,904 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-28 11:42:10,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-28 11:42:15,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-28 11:42:24,848 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:42:30,251 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6150, loss[loss=0.07859, simple_loss=0.1133, pruned_loss=0.01766, audio_tagging_loss=0.004301, over 15787.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08953, pruned_loss=0.01204, audio_tagging_loss=0.008553, over 3042813.04 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:42:33,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-28 11:42:41,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3487906.6666666665, ans=0.2 2023-11-28 11:42:56,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-28 11:42:57,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3487973.3333333335, ans=0.125 2023-11-28 11:43:01,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3487973.3333333335, ans=0.035 2023-11-28 11:43:01,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3487973.3333333335, ans=0.125 2023-11-28 11:43:07,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3488040.0, ans=0.0 2023-11-28 11:43:20,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3488106.6666666665, ans=0.125 2023-11-28 11:43:28,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=22.5 2023-11-28 11:43:28,693 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6200, loss[loss=0.06223, simple_loss=0.08442, pruned_loss=0.009192, audio_tagging_loss=0.01083, over 15712.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08863, pruned_loss=0.01206, audio_tagging_loss=0.008686, over 3039682.49 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:43:29,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-28 11:43:33,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-28 11:43:40,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3488240.0, ans=0.1 2023-11-28 11:43:41,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3488240.0, ans=0.0 2023-11-28 11:43:42,414 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.740e+01 9.407e+01 1.006e+02 1.193e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:43:47,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488240.0, ans=0.125 2023-11-28 11:43:55,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-28 11:43:56,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3488306.6666666665, ans=0.0 2023-11-28 11:43:57,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488306.6666666665, ans=0.125 2023-11-28 11:44:10,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3488373.3333333335, ans=0.0 2023-11-28 11:44:22,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488440.0, ans=0.125 2023-11-28 11:44:26,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488440.0, ans=0.1 2023-11-28 11:44:27,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3488506.6666666665, ans=0.0 2023-11-28 11:44:27,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3488506.6666666665, ans=0.0 2023-11-28 11:44:28,637 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6250, loss[loss=0.03827, simple_loss=0.04966, pruned_loss=0.00452, audio_tagging_loss=0.008925, over 15681.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.0893, pruned_loss=0.01217, audio_tagging_loss=0.008719, over 3041521.07 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:44:36,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2023-11-28 11:44:38,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3488506.6666666665, ans=0.125 2023-11-28 11:44:51,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3488640.0, ans=0.125 2023-11-28 11:44:54,338 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-28 11:44:54,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3488640.0, ans=0.1 2023-11-28 11:44:59,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3488640.0, ans=0.0 2023-11-28 11:45:04,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3488706.6666666665, ans=10.0 2023-11-28 11:45:17,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3488773.3333333335, ans=0.0 2023-11-28 11:45:18,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3488773.3333333335, ans=0.1 2023-11-28 11:45:27,601 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6300, loss[loss=0.06401, simple_loss=0.08535, pruned_loss=0.01244, audio_tagging_loss=0.008894, over 16165.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.089, pruned_loss=0.01222, audio_tagging_loss=0.008804, over 3045250.85 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:45:39,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3488906.6666666665, ans=0.125 2023-11-28 11:45:39,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.790e+01 9.480e+01 1.019e+02 1.243e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 11:45:53,518 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-28 11:46:06,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3489040.0, ans=0.025 2023-11-28 11:46:08,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-11-28 11:46:18,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3489106.6666666665, ans=0.0 2023-11-28 11:46:25,309 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6350, loss[loss=0.05271, simple_loss=0.06959, pruned_loss=0.01003, audio_tagging_loss=0.007889, over 15217.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08793, pruned_loss=0.012, audio_tagging_loss=0.00902, over 3038950.87 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:46:45,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489240.0, ans=0.1 2023-11-28 11:46:51,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-28 11:47:16,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=22.5 2023-11-28 11:47:23,871 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6400, loss[loss=0.06026, simple_loss=0.09273, pruned_loss=0.005633, audio_tagging_loss=0.008265, over 14924.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08768, pruned_loss=0.01203, audio_tagging_loss=0.009103, over 3038767.31 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:47:28,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3489506.6666666665, ans=0.0 2023-11-28 11:47:36,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.832e+01 9.473e+01 1.012e+02 1.860e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 11:47:40,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-28 11:47:49,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-28 11:47:50,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-28 11:47:58,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3489706.6666666665, ans=0.0 2023-11-28 11:48:01,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-28 11:48:06,295 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:48:20,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3489773.3333333335, ans=0.125 2023-11-28 11:48:22,438 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6450, loss[loss=0.0566, simple_loss=0.0776, pruned_loss=0.009808, audio_tagging_loss=0.007988, over 15652.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08773, pruned_loss=0.01203, audio_tagging_loss=0.009162, over 3038255.07 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:48:40,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3489906.6666666665, ans=0.0 2023-11-28 11:48:43,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3489973.3333333335, ans=0.125 2023-11-28 11:48:47,549 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-28 11:49:01,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3490040.0, ans=0.1 2023-11-28 11:49:05,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3490040.0, ans=0.1 2023-11-28 11:49:15,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3490106.6666666665, ans=0.1 2023-11-28 11:49:18,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3490106.6666666665, ans=0.95 2023-11-28 11:49:20,230 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6500, loss[loss=0.06293, simple_loss=0.08, pruned_loss=0.01419, audio_tagging_loss=0.008737, over 14648.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08794, pruned_loss=0.01198, audio_tagging_loss=0.009164, over 3041384.44 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:49:20,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3490173.3333333335, ans=10.0 2023-11-28 11:49:22,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3490173.3333333335, ans=0.0 2023-11-28 11:49:26,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.97 vs. limit=6.0 2023-11-28 11:49:30,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3490240.0, ans=0.0 2023-11-28 11:49:33,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 8.967e+01 9.507e+01 1.009e+02 1.264e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 11:49:35,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3490240.0, ans=0.2 2023-11-28 11:49:46,617 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-28 11:50:18,385 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6550, loss[loss=0.04459, simple_loss=0.05232, pruned_loss=0.007582, audio_tagging_loss=0.01084, over 13759.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08801, pruned_loss=0.01192, audio_tagging_loss=0.009018, over 3040635.64 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:50:32,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3490573.3333333335, ans=0.125 2023-11-28 11:50:44,079 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-28 11:50:45,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3490640.0, ans=0.0 2023-11-28 11:51:03,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-28 11:51:16,490 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:51:16,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-28 11:51:17,298 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6600, loss[loss=0.0624, simple_loss=0.08226, pruned_loss=0.0139, audio_tagging_loss=0.007372, over 16772.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08936, pruned_loss=0.01205, audio_tagging_loss=0.008873, over 3042693.22 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:51:17,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3490840.0, ans=0.2 2023-11-28 11:51:30,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.875e+01 9.605e+01 1.016e+02 1.315e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:51:31,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-28 11:51:39,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3490973.3333333335, ans=0.0 2023-11-28 11:51:41,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-28 11:51:47,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3490973.3333333335, ans=0.125 2023-11-28 11:52:14,925 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6650, loss[loss=0.07939, simple_loss=0.1173, pruned_loss=0.01486, audio_tagging_loss=0.00588, over 15974.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0901, pruned_loss=0.01226, audio_tagging_loss=0.008727, over 3045675.26 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:52:23,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3491173.3333333335, ans=0.125 2023-11-28 11:52:32,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-28 11:52:41,189 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-28 11:52:49,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3491373.3333333335, ans=0.125 2023-11-28 11:52:59,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3491373.3333333335, ans=0.0 2023-11-28 11:53:02,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3491440.0, ans=0.125 2023-11-28 11:53:11,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3491440.0, ans=0.2 2023-11-28 11:53:12,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3491506.6666666665, ans=0.1 2023-11-28 11:53:13,428 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6700, loss[loss=0.05963, simple_loss=0.08085, pruned_loss=0.01174, audio_tagging_loss=0.007465, over 15829.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09053, pruned_loss=0.01244, audio_tagging_loss=0.008678, over 3041250.85 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:53:14,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-28 11:53:28,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.754e+01 9.466e+01 1.016e+02 1.269e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 11:53:38,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-28 11:53:39,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-28 11:53:42,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3491640.0, ans=0.07 2023-11-28 11:53:43,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2023-11-28 11:53:58,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3491706.6666666665, ans=0.125 2023-11-28 11:54:12,334 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6750, loss[loss=0.08984, simple_loss=0.126, pruned_loss=0.01807, audio_tagging_loss=0.008789, over 15070.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08983, pruned_loss=0.0124, audio_tagging_loss=0.008635, over 3035357.97 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:54:34,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3491973.3333333335, ans=0.0 2023-11-28 11:54:36,931 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-28 11:54:58,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3492106.6666666665, ans=0.125 2023-11-28 11:55:10,805 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6800, loss[loss=0.06477, simple_loss=0.09828, pruned_loss=0.009561, audio_tagging_loss=0.006069, over 15430.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08959, pruned_loss=0.01233, audio_tagging_loss=0.008598, over 3040750.70 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:55:19,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=12.0 2023-11-28 11:55:19,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3492173.3333333335, ans=0.125 2023-11-28 11:55:24,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.914e+01 9.606e+01 1.021e+02 1.348e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:55:35,900 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-28 11:55:52,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3492373.3333333335, ans=0.1 2023-11-28 11:56:03,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3492440.0, ans=0.05 2023-11-28 11:56:09,067 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6850, loss[loss=0.08512, simple_loss=0.1204, pruned_loss=0.01766, audio_tagging_loss=0.00727, over 16691.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09051, pruned_loss=0.01254, audio_tagging_loss=0.008557, over 3043414.19 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:56:18,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3492506.6666666665, ans=0.0 2023-11-28 11:56:22,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3492573.3333333335, ans=0.125 2023-11-28 11:56:29,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492573.3333333335, ans=0.1 2023-11-28 11:56:35,866 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-28 11:56:37,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3492640.0, ans=0.125 2023-11-28 11:56:38,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2023-11-28 11:56:53,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-28 11:56:59,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3492773.3333333335, ans=0.125 2023-11-28 11:57:06,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3492840.0, ans=0.125 2023-11-28 11:57:07,909 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6900, loss[loss=0.08649, simple_loss=0.1185, pruned_loss=0.02111, audio_tagging_loss=0.006109, over 16749.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.0908, pruned_loss=0.01259, audio_tagging_loss=0.008532, over 3045693.06 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:57:14,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3492840.0, ans=0.125 2023-11-28 11:57:23,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.756e+01 9.577e+01 1.062e+02 1.292e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:57:33,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-28 11:57:54,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493106.6666666665, ans=0.1 2023-11-28 11:57:56,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-28 11:57:57,185 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:58:06,565 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6950, loss[loss=0.06095, simple_loss=0.07598, pruned_loss=0.01497, audio_tagging_loss=0.007984, over 15104.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09016, pruned_loss=0.01243, audio_tagging_loss=0.008535, over 3043272.88 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:58:07,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3493173.3333333335, ans=0.2 2023-11-28 11:58:08,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-28 11:58:20,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3493240.0, ans=0.125 2023-11-28 11:58:25,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3493240.0, ans=0.125 2023-11-28 11:58:25,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3493240.0, ans=0.125 2023-11-28 11:58:31,618 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-28 11:58:37,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3493306.6666666665, ans=0.125 2023-11-28 11:58:57,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3493440.0, ans=0.125 2023-11-28 11:58:57,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3493440.0, ans=0.125 2023-11-28 11:59:06,775 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7000, loss[loss=0.06294, simple_loss=0.08593, pruned_loss=0.01218, audio_tagging_loss=0.007793, over 15545.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09016, pruned_loss=0.01243, audio_tagging_loss=0.008597, over 3042793.79 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:59:14,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-28 11:59:19,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3493573.3333333335, ans=0.0 2023-11-28 11:59:21,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.923e+01 9.384e+01 1.033e+02 1.328e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 11:59:24,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2023-11-28 11:59:32,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-28 11:59:40,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3493640.0, ans=0.125 2023-11-28 12:00:05,203 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7050, loss[loss=0.06231, simple_loss=0.08903, pruned_loss=0.01036, audio_tagging_loss=0.007435, over 15102.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08981, pruned_loss=0.01239, audio_tagging_loss=0.00868, over 3035692.77 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:00:08,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3493840.0, ans=0.1 2023-11-28 12:00:31,265 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-28 12:00:42,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-11-28 12:00:45,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3494040.0, ans=0.125 2023-11-28 12:00:55,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3494106.6666666665, ans=0.05 2023-11-28 12:01:03,882 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7100, loss[loss=0.06719, simple_loss=0.07937, pruned_loss=0.01564, audio_tagging_loss=0.01187, over 15060.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09008, pruned_loss=0.01238, audio_tagging_loss=0.008706, over 3046705.90 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:01:11,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-28 12:01:12,552 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:01:18,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.879e+01 9.631e+01 1.062e+02 1.360e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 12:01:19,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3494240.0, ans=0.125 2023-11-28 12:01:20,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-28 12:01:29,472 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-28 12:01:39,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-28 12:01:41,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-11-28 12:01:46,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-28 12:02:01,618 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7150, loss[loss=0.08792, simple_loss=0.1211, pruned_loss=0.01937, audio_tagging_loss=0.007975, over 15497.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08986, pruned_loss=0.01233, audio_tagging_loss=0.008786, over 3046005.46 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:02:01,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3494506.6666666665, ans=10.0 2023-11-28 12:02:02,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-11-28 12:02:15,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3494573.3333333335, ans=0.1 2023-11-28 12:02:27,460 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-28 12:02:54,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3494773.3333333335, ans=0.05 2023-11-28 12:02:55,790 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:02:59,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:00,037 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7200, loss[loss=0.06936, simple_loss=0.08411, pruned_loss=0.01601, audio_tagging_loss=0.01129, over 14705.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08953, pruned_loss=0.0122, audio_tagging_loss=0.00889, over 3035736.99 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:03:04,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-28 12:03:08,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:14,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3494906.6666666665, ans=0.0 2023-11-28 12:03:15,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.816e+01 9.709e+01 1.018e+02 1.271e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 12:03:17,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3494906.6666666665, ans=0.0 2023-11-28 12:03:18,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3494906.6666666665, ans=0.2 2023-11-28 12:03:25,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-28 12:03:32,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3494973.3333333335, ans=0.125 2023-11-28 12:03:33,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3495040.0, ans=0.1 2023-11-28 12:03:53,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3495106.6666666665, ans=0.2 2023-11-28 12:03:57,753 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7250, loss[loss=0.05691, simple_loss=0.07296, pruned_loss=0.009575, audio_tagging_loss=0.01085, over 13217.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08929, pruned_loss=0.01219, audio_tagging_loss=0.008982, over 3031264.42 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:04:01,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3495173.3333333335, ans=0.125 2023-11-28 12:04:14,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3495240.0, ans=0.0 2023-11-28 12:04:23,233 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-28 12:04:35,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3495373.3333333335, ans=0.07 2023-11-28 12:04:41,251 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:04:46,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=22.5 2023-11-28 12:04:56,005 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7300, loss[loss=0.073, simple_loss=0.1072, pruned_loss=0.01252, audio_tagging_loss=0.006876, over 15550.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0889, pruned_loss=0.01211, audio_tagging_loss=0.008833, over 3028448.27 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:05:11,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-28 12:05:12,053 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.741e+01 9.294e+01 1.019e+02 1.260e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 12:05:12,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-11-28 12:05:21,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-28 12:05:41,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2023-11-28 12:05:54,074 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7350, loss[loss=0.07224, simple_loss=0.09477, pruned_loss=0.01675, audio_tagging_loss=0.008112, over 16256.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08973, pruned_loss=0.01225, audio_tagging_loss=0.008627, over 3038528.41 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:06:08,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-28 12:06:16,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-28 12:06:19,669 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-28 12:06:32,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2023-11-28 12:06:40,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3496106.6666666665, ans=0.125 2023-11-28 12:06:47,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3496106.6666666665, ans=0.0 2023-11-28 12:06:53,736 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7400, loss[loss=0.06865, simple_loss=0.08615, pruned_loss=0.01541, audio_tagging_loss=0.01016, over 14644.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08942, pruned_loss=0.0123, audio_tagging_loss=0.008536, over 3038545.92 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:06:59,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3496173.3333333335, ans=0.2 2023-11-28 12:07:09,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.769e+01 9.562e+01 1.042e+02 1.496e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:07:17,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3496306.6666666665, ans=0.125 2023-11-28 12:07:19,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-28 12:07:34,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3496373.3333333335, ans=0.125 2023-11-28 12:07:39,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3496440.0, ans=0.0 2023-11-28 12:07:42,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3496440.0, ans=0.125 2023-11-28 12:07:50,863 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7450, loss[loss=0.06249, simple_loss=0.08767, pruned_loss=0.01091, audio_tagging_loss=0.007745, over 14673.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08902, pruned_loss=0.01218, audio_tagging_loss=0.008484, over 3038766.91 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:08:06,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3496573.3333333335, ans=0.125 2023-11-28 12:08:08,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-28 12:08:17,080 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-28 12:08:36,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3496773.3333333335, ans=0.125 2023-11-28 12:08:38,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3496773.3333333335, ans=0.2 2023-11-28 12:08:43,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3496773.3333333335, ans=0.125 2023-11-28 12:08:43,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-28 12:08:49,245 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7500, loss[loss=0.06039, simple_loss=0.08141, pruned_loss=0.01078, audio_tagging_loss=0.008905, over 16072.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08905, pruned_loss=0.01216, audio_tagging_loss=0.008388, over 3038760.47 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:08:50,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3496840.0, ans=0.125 2023-11-28 12:08:55,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3496840.0, ans=0.0 2023-11-28 12:09:01,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-28 12:09:05,328 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.807e+01 9.534e+01 1.017e+02 1.454e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 12:09:14,145 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-28 12:09:16,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3496973.3333333335, ans=0.125 2023-11-28 12:09:42,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3497106.6666666665, ans=0.125 2023-11-28 12:09:43,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3497106.6666666665, ans=0.1 2023-11-28 12:09:46,881 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7550, loss[loss=0.0634, simple_loss=0.08175, pruned_loss=0.01147, audio_tagging_loss=0.01105, over 14790.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08848, pruned_loss=0.01206, audio_tagging_loss=0.008475, over 3031249.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:09:52,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3497173.3333333335, ans=0.2 2023-11-28 12:10:00,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3497240.0, ans=0.0 2023-11-28 12:10:00,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3497240.0, ans=0.2 2023-11-28 12:10:01,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3497240.0, ans=0.125 2023-11-28 12:10:03,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3497240.0, ans=0.125 2023-11-28 12:10:11,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-28 12:10:20,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3497373.3333333335, ans=0.0 2023-11-28 12:10:20,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-28 12:10:38,559 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:10:38,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3497440.0, ans=0.0 2023-11-28 12:10:44,041 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7600, loss[loss=0.0676, simple_loss=0.09236, pruned_loss=0.01069, audio_tagging_loss=0.01073, over 15836.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0894, pruned_loss=0.01208, audio_tagging_loss=0.00845, over 3036172.06 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:10:49,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3497506.6666666665, ans=0.2 2023-11-28 12:10:52,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3497506.6666666665, ans=0.0 2023-11-28 12:10:52,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-11-28 12:10:53,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-11-28 12:10:55,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3497573.3333333335, ans=0.0 2023-11-28 12:11:00,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.865e+01 9.544e+01 1.025e+02 1.373e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 12:11:03,740 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:11:07,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=15.0 2023-11-28 12:11:09,759 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-28 12:11:25,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497706.6666666665, ans=0.1 2023-11-28 12:11:41,863 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7650, loss[loss=0.08108, simple_loss=0.1224, pruned_loss=0.01341, audio_tagging_loss=0.006465, over 14604.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08846, pruned_loss=0.01195, audio_tagging_loss=0.00843, over 3032243.61 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:12:08,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-28 12:12:14,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3497973.3333333335, ans=0.125 2023-11-28 12:12:41,301 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7700, loss[loss=0.07979, simple_loss=0.1137, pruned_loss=0.01357, audio_tagging_loss=0.00939, over 15978.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08817, pruned_loss=0.0118, audio_tagging_loss=0.008516, over 3027886.37 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:12:51,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3498240.0, ans=0.125 2023-11-28 12:12:54,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3498240.0, ans=0.125 2023-11-28 12:12:57,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.890e+01 9.413e+01 1.018e+02 1.310e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 12:13:01,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3498240.0, ans=0.0 2023-11-28 12:13:05,501 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-28 12:13:06,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2023-11-28 12:13:17,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3498373.3333333335, ans=0.04949747468305833 2023-11-28 12:13:22,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3498373.3333333335, ans=0.0 2023-11-28 12:13:25,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3498373.3333333335, ans=0.0 2023-11-28 12:13:31,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3498440.0, ans=0.0 2023-11-28 12:13:38,336 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7750, loss[loss=0.08194, simple_loss=0.1106, pruned_loss=0.01483, audio_tagging_loss=0.01179, over 14769.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08786, pruned_loss=0.01178, audio_tagging_loss=0.008582, over 3031822.82 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:13:40,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3498506.6666666665, ans=0.125 2023-11-28 12:13:50,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3498573.3333333335, ans=0.0 2023-11-28 12:14:03,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3498640.0, ans=0.125 2023-11-28 12:14:03,974 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-28 12:14:19,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3498706.6666666665, ans=0.125 2023-11-28 12:14:23,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3498706.6666666665, ans=0.125 2023-11-28 12:14:35,911 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7800, loss[loss=0.07335, simple_loss=0.09637, pruned_loss=0.01551, audio_tagging_loss=0.009649, over 15058.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08865, pruned_loss=0.01206, audio_tagging_loss=0.00863, over 3034702.07 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:14:54,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.929e+01 9.549e+01 1.038e+02 1.507e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 12:15:02,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-28 12:15:23,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3499106.6666666665, ans=0.0 2023-11-28 12:15:34,939 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7850, loss[loss=0.0757, simple_loss=0.101, pruned_loss=0.01539, audio_tagging_loss=0.00983, over 16488.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08829, pruned_loss=0.01199, audio_tagging_loss=0.008698, over 3042557.14 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:15:38,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3499173.3333333335, ans=0.035 2023-11-28 12:15:43,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3499173.3333333335, ans=0.0 2023-11-28 12:15:59,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-28 12:16:27,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3499440.0, ans=0.07 2023-11-28 12:16:32,396 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7900, loss[loss=0.0643, simple_loss=0.09052, pruned_loss=0.01155, audio_tagging_loss=0.007497, over 14609.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08825, pruned_loss=0.01193, audio_tagging_loss=0.008777, over 3043566.24 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:16:35,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2023-11-28 12:16:35,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3499506.6666666665, ans=0.0 2023-11-28 12:16:38,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3499506.6666666665, ans=0.09899494936611666 2023-11-28 12:16:38,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-11-28 12:16:39,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3499506.6666666665, ans=0.125 2023-11-28 12:16:41,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3499506.6666666665, ans=0.0 2023-11-28 12:16:42,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3499573.3333333335, ans=0.125 2023-11-28 12:16:49,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.884e+01 9.612e+01 1.005e+02 1.246e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 12:16:49,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3499573.3333333335, ans=0.0 2023-11-28 12:16:53,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3499573.3333333335, ans=0.125 2023-11-28 12:16:57,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-28 12:16:57,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3499640.0, ans=0.07 2023-11-28 12:17:16,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3499706.6666666665, ans=0.0 2023-11-28 12:17:28,951 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7950, loss[loss=0.067, simple_loss=0.08662, pruned_loss=0.01377, audio_tagging_loss=0.009916, over 15024.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08796, pruned_loss=0.01178, audio_tagging_loss=0.008857, over 3043074.71 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:17:31,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3499840.0, ans=0.125 2023-11-28 12:17:40,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3499906.6666666665, ans=0.0 2023-11-28 12:17:44,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=22.5 2023-11-28 12:17:50,040 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:17:55,456 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-28 12:18:04,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3500040.0, ans=0.125 2023-11-28 12:18:27,134 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8000, loss[loss=0.05851, simple_loss=0.07689, pruned_loss=0.01198, audio_tagging_loss=0.00808, over 14892.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08788, pruned_loss=0.01173, audio_tagging_loss=0.008868, over 3036818.84 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:18:38,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3500240.0, ans=0.0 2023-11-28 12:18:45,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.932e+01 9.362e+01 1.010e+02 1.203e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 12:18:51,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-28 12:18:52,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-28 12:19:25,992 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8050, loss[loss=0.08291, simple_loss=0.1161, pruned_loss=0.01848, audio_tagging_loss=0.006405, over 14659.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08755, pruned_loss=0.01179, audio_tagging_loss=0.009013, over 3039482.09 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:19:27,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-28 12:19:29,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3500506.6666666665, ans=0.0 2023-11-28 12:19:30,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-28 12:19:50,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-28 12:19:51,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3500640.0, ans=0.0 2023-11-28 12:19:52,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3500640.0, ans=0.125 2023-11-28 12:20:19,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3500773.3333333335, ans=0.2 2023-11-28 12:20:23,013 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8100, loss[loss=0.06778, simple_loss=0.09941, pruned_loss=0.0122, audio_tagging_loss=0.005881, over 15047.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08835, pruned_loss=0.01188, audio_tagging_loss=0.008928, over 3039276.47 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:20:36,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=22.5 2023-11-28 12:20:40,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.823e+01 9.483e+01 1.016e+02 1.288e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 12:20:43,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3500906.6666666665, ans=0.0 2023-11-28 12:20:48,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-28 12:20:49,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3500973.3333333335, ans=0.2 2023-11-28 12:20:57,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3501040.0, ans=0.1 2023-11-28 12:21:00,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3501040.0, ans=0.125 2023-11-28 12:21:04,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3501040.0, ans=0.0 2023-11-28 12:21:19,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-28 12:21:20,698 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8150, loss[loss=0.05266, simple_loss=0.07959, pruned_loss=0.006462, audio_tagging_loss=0.006403, over 14165.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08924, pruned_loss=0.01199, audio_tagging_loss=0.008706, over 3042680.39 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:21:27,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3501173.3333333335, ans=0.1 2023-11-28 12:21:38,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3501240.0, ans=0.2 2023-11-28 12:21:45,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501306.6666666665, ans=0.1 2023-11-28 12:21:46,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-28 12:21:52,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3501306.6666666665, ans=0.0 2023-11-28 12:21:58,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3501373.3333333335, ans=0.125 2023-11-28 12:21:59,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3501373.3333333335, ans=0.2 2023-11-28 12:22:17,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3501440.0, ans=0.2 2023-11-28 12:22:19,384 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8200, loss[loss=0.06817, simple_loss=0.1097, pruned_loss=0.009538, audio_tagging_loss=0.003799, over 15580.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09035, pruned_loss=0.01222, audio_tagging_loss=0.008606, over 3049114.87 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:22:21,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3501506.6666666665, ans=0.95 2023-11-28 12:22:25,425 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:22:36,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 8.686e+01 9.334e+01 1.013e+02 1.382e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 12:22:44,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-28 12:22:47,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.51 vs. limit=10.0 2023-11-28 12:23:03,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-28 12:23:04,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3501773.3333333335, ans=0.125 2023-11-28 12:23:10,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3501773.3333333335, ans=0.1 2023-11-28 12:23:14,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501773.3333333335, ans=0.1 2023-11-28 12:23:16,943 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8250, loss[loss=0.05803, simple_loss=0.07196, pruned_loss=0.009654, audio_tagging_loss=0.0124, over 14942.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08956, pruned_loss=0.01206, audio_tagging_loss=0.008613, over 3049837.79 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:23:29,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3501906.6666666665, ans=0.1 2023-11-28 12:23:37,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3501906.6666666665, ans=0.0 2023-11-28 12:23:42,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-11-28 12:23:42,755 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-28 12:23:45,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-28 12:23:56,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3502040.0, ans=0.125 2023-11-28 12:23:59,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-28 12:24:14,675 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8300, loss[loss=0.07341, simple_loss=0.1043, pruned_loss=0.01321, audio_tagging_loss=0.008068, over 16450.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08889, pruned_loss=0.01187, audio_tagging_loss=0.008635, over 3049430.03 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:24:14,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3502173.3333333335, ans=0.125 2023-11-28 12:24:32,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2023-11-28 12:24:32,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.723e+01 9.443e+01 1.022e+02 1.604e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 12:24:37,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3502306.6666666665, ans=0.0 2023-11-28 12:24:40,414 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-28 12:24:54,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-28 12:24:55,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3502373.3333333335, ans=0.0 2023-11-28 12:25:12,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3502506.6666666665, ans=0.1 2023-11-28 12:25:12,872 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8350, loss[loss=0.05269, simple_loss=0.0777, pruned_loss=0.005965, audio_tagging_loss=0.007877, over 15270.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08881, pruned_loss=0.01188, audio_tagging_loss=0.00852, over 3041312.65 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:25:18,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3502506.6666666665, ans=0.125 2023-11-28 12:25:37,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-28 12:25:47,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3502706.6666666665, ans=0.125 2023-11-28 12:26:01,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3502773.3333333335, ans=0.125 2023-11-28 12:26:10,963 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8400, loss[loss=0.06437, simple_loss=0.08736, pruned_loss=0.01247, audio_tagging_loss=0.008223, over 15449.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08854, pruned_loss=0.01187, audio_tagging_loss=0.008528, over 3041696.14 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:26:28,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3502906.6666666665, ans=15.0 2023-11-28 12:26:29,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.791e+01 9.357e+01 9.969e+01 1.892e+02, threshold=1.871e+02, percent-clipped=1.0 2023-11-28 12:26:36,636 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-28 12:27:07,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-28 12:27:07,923 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8450, loss[loss=0.07394, simple_loss=0.1037, pruned_loss=0.01456, audio_tagging_loss=0.007522, over 15247.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08796, pruned_loss=0.01171, audio_tagging_loss=0.008514, over 3049071.14 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:27:23,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-28 12:27:27,091 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:27:29,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-28 12:27:33,475 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-28 12:27:37,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3503306.6666666665, ans=0.5 2023-11-28 12:28:06,300 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8500, loss[loss=0.05674, simple_loss=0.0822, pruned_loss=0.008004, audio_tagging_loss=0.007635, over 15117.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08886, pruned_loss=0.01174, audio_tagging_loss=0.008458, over 3052312.11 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:28:06,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3503506.6666666665, ans=0.0 2023-11-28 12:28:24,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.935e+01 9.397e+01 1.006e+02 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 12:28:25,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=10.0 2023-11-28 12:28:31,024 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-28 12:29:03,085 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8550, loss[loss=0.06191, simple_loss=0.07058, pruned_loss=0.01346, audio_tagging_loss=0.01317, over 15953.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08989, pruned_loss=0.0121, audio_tagging_loss=0.008462, over 3049740.26 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:29:05,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3503840.0, ans=0.0 2023-11-28 12:29:24,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3503906.6666666665, ans=0.0 2023-11-28 12:29:24,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3503906.6666666665, ans=0.07 2023-11-28 12:29:28,487 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-28 12:29:32,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3503973.3333333335, ans=0.125 2023-11-28 12:29:38,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2023-11-28 12:29:43,868 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:30:00,678 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8600, loss[loss=0.04435, simple_loss=0.05819, pruned_loss=0.005442, audio_tagging_loss=0.009809, over 14297.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08845, pruned_loss=0.01196, audio_tagging_loss=0.008605, over 3049882.82 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:30:19,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.911e+01 9.426e+01 1.012e+02 1.309e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 12:30:26,558 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-28 12:30:31,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-28 12:30:33,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-28 12:30:54,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3504440.0, ans=0.125 2023-11-28 12:30:59,368 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8650, loss[loss=0.06153, simple_loss=0.08813, pruned_loss=0.007921, audio_tagging_loss=0.009541, over 15637.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08881, pruned_loss=0.01194, audio_tagging_loss=0.00863, over 3055566.89 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:30:59,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3504506.6666666665, ans=0.0 2023-11-28 12:31:24,143 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-28 12:31:29,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3504640.0, ans=0.2 2023-11-28 12:31:37,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2023-11-28 12:31:44,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3504773.3333333335, ans=0.0 2023-11-28 12:31:55,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-28 12:31:56,479 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8700, loss[loss=0.0655, simple_loss=0.08725, pruned_loss=0.01338, audio_tagging_loss=0.00849, over 15250.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08899, pruned_loss=0.01202, audio_tagging_loss=0.008765, over 3049359.80 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:32:14,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.928e+01 9.510e+01 1.026e+02 1.329e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 12:32:21,824 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-28 12:32:28,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2023-11-28 12:32:30,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3505040.0, ans=0.125 2023-11-28 12:32:44,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3505106.6666666665, ans=0.1 2023-11-28 12:32:48,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3505106.6666666665, ans=0.2 2023-11-28 12:32:53,030 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8750, loss[loss=0.06278, simple_loss=0.08875, pruned_loss=0.01046, audio_tagging_loss=0.007946, over 15347.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08965, pruned_loss=0.01205, audio_tagging_loss=0.00873, over 3046606.87 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:32:58,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2023-11-28 12:33:05,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-28 12:33:19,024 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-28 12:33:25,050 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:33:29,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3505373.3333333335, ans=0.125 2023-11-28 12:33:46,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3505440.0, ans=0.0 2023-11-28 12:33:51,366 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8800, loss[loss=0.07456, simple_loss=0.1057, pruned_loss=0.01241, audio_tagging_loss=0.009299, over 15199.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0907, pruned_loss=0.01237, audio_tagging_loss=0.008851, over 3048227.92 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:33:51,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3505506.6666666665, ans=0.0 2023-11-28 12:34:01,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3505573.3333333335, ans=0.125 2023-11-28 12:34:04,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3505573.3333333335, ans=0.125 2023-11-28 12:34:09,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.660e+01 8.935e+01 9.643e+01 1.033e+02 1.238e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 12:34:09,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3505573.3333333335, ans=0.1 2023-11-28 12:34:16,043 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-28 12:34:16,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3505640.0, ans=0.5 2023-11-28 12:34:16,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3505640.0, ans=0.125 2023-11-28 12:34:18,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3505640.0, ans=0.0 2023-11-28 12:34:33,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3505706.6666666665, ans=0.015 2023-11-28 12:34:42,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3505773.3333333335, ans=0.1 2023-11-28 12:34:48,667 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8850, loss[loss=0.07209, simple_loss=0.0935, pruned_loss=0.0175, audio_tagging_loss=0.007837, over 15104.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09055, pruned_loss=0.01228, audio_tagging_loss=0.008902, over 3045242.23 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:35:04,074 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:35:14,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-28 12:35:34,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3506106.6666666665, ans=0.0 2023-11-28 12:35:34,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3506106.6666666665, ans=0.125 2023-11-28 12:35:44,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-28 12:35:45,402 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8900, loss[loss=0.06808, simple_loss=0.1061, pruned_loss=0.008445, audio_tagging_loss=0.006568, over 15025.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09078, pruned_loss=0.01225, audio_tagging_loss=0.0088, over 3039590.05 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:35:50,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-11-28 12:35:52,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3506173.3333333335, ans=0.125 2023-11-28 12:35:58,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3506240.0, ans=0.0 2023-11-28 12:36:00,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3506240.0, ans=0.07 2023-11-28 12:36:00,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3506240.0, ans=0.125 2023-11-28 12:36:01,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-28 12:36:03,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3506240.0, ans=0.125 2023-11-28 12:36:04,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.621e+01 9.140e+01 9.989e+01 1.171e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 12:36:11,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-28 12:36:25,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3506373.3333333335, ans=0.125 2023-11-28 12:36:38,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3506440.0, ans=0.2 2023-11-28 12:36:43,087 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8950, loss[loss=0.04294, simple_loss=0.05494, pruned_loss=0.008113, audio_tagging_loss=0.007353, over 13593.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08953, pruned_loss=0.01192, audio_tagging_loss=0.008651, over 3038487.17 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:36:45,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3506506.6666666665, ans=0.0 2023-11-28 12:36:58,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3506573.3333333335, ans=0.0 2023-11-28 12:36:58,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3506573.3333333335, ans=0.125 2023-11-28 12:37:07,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-28 12:37:11,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3506640.0, ans=0.2 2023-11-28 12:37:40,498 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9000, loss[loss=0.04398, simple_loss=0.05433, pruned_loss=0.005736, audio_tagging_loss=0.01108, over 14102.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08986, pruned_loss=0.012, audio_tagging_loss=0.008511, over 3049038.69 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:37:40,499 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 12:38:09,045 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2981, 4.2518, 4.4686, 4.4347], device='cuda:3') 2023-11-28 12:38:11,260 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0120, 5.8563, 5.6427, 5.5682], device='cuda:3') 2023-11-28 12:38:15,233 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05875, simple_loss=0.05057, pruned_loss=0.005344, audio_tagging_loss=0.02812, over 4681554.00 frames. 2023-11-28 12:38:15,234 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 12:38:21,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3506840.0, ans=0.0 2023-11-28 12:38:30,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506906.6666666665, ans=0.125 2023-11-28 12:38:35,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.830e+01 9.439e+01 1.037e+02 1.240e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 12:38:41,580 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-28 12:38:52,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3507040.0, ans=0.125 2023-11-28 12:39:13,641 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9050, loss[loss=0.05811, simple_loss=0.08342, pruned_loss=0.008197, audio_tagging_loss=0.008202, over 14846.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.089, pruned_loss=0.01189, audio_tagging_loss=0.008565, over 3044237.76 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:39:23,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3507173.3333333335, ans=0.125 2023-11-28 12:39:25,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3507240.0, ans=0.2 2023-11-28 12:39:26,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3507240.0, ans=0.125 2023-11-28 12:39:38,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-28 12:39:44,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3507306.6666666665, ans=0.0 2023-11-28 12:39:50,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-28 12:39:51,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-28 12:39:51,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-28 12:40:11,483 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9100, loss[loss=0.08247, simple_loss=0.1155, pruned_loss=0.01852, audio_tagging_loss=0.006189, over 14637.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08962, pruned_loss=0.01193, audio_tagging_loss=0.008502, over 3039362.44 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:40:15,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2023-11-28 12:40:21,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3507573.3333333335, ans=0.1 2023-11-28 12:40:30,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.014e+01 9.601e+01 1.029e+02 1.341e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 12:40:36,346 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-28 12:40:44,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3507640.0, ans=0.0 2023-11-28 12:40:44,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2023-11-28 12:40:47,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3507706.6666666665, ans=0.125 2023-11-28 12:41:08,305 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9150, loss[loss=0.05793, simple_loss=0.07257, pruned_loss=0.01156, audio_tagging_loss=0.01009, over 15175.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08904, pruned_loss=0.01181, audio_tagging_loss=0.00857, over 3043659.85 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:41:08,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3507840.0, ans=0.0 2023-11-28 12:41:14,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3507840.0, ans=0.125 2023-11-28 12:41:34,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-28 12:41:40,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.26 vs. limit=10.0 2023-11-28 12:41:56,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3508106.6666666665, ans=0.0 2023-11-28 12:42:05,858 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9200, loss[loss=0.07201, simple_loss=0.08987, pruned_loss=0.0163, audio_tagging_loss=0.01078, over 14908.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08801, pruned_loss=0.01161, audio_tagging_loss=0.008592, over 3043449.43 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:42:09,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3508173.3333333335, ans=0.0 2023-11-28 12:42:25,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.609e+01 9.408e+01 1.003e+02 1.431e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 12:42:31,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-28 12:42:34,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3508306.6666666665, ans=0.125 2023-11-28 12:42:38,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3508373.3333333335, ans=0.0 2023-11-28 12:42:41,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3508373.3333333335, ans=0.125 2023-11-28 12:42:50,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3508440.0, ans=0.125 2023-11-28 12:43:02,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3508506.6666666665, ans=0.125 2023-11-28 12:43:02,891 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9250, loss[loss=0.05414, simple_loss=0.06952, pruned_loss=0.01014, audio_tagging_loss=0.009245, over 14938.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08777, pruned_loss=0.01157, audio_tagging_loss=0.008508, over 3048698.41 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:43:25,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3508640.0, ans=0.2 2023-11-28 12:43:27,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-28 12:43:31,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3508640.0, ans=0.1 2023-11-28 12:43:42,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2023-11-28 12:43:50,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3508773.3333333335, ans=0.0 2023-11-28 12:43:59,851 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9300, loss[loss=0.05491, simple_loss=0.07576, pruned_loss=0.008133, audio_tagging_loss=0.008895, over 14864.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08881, pruned_loss=0.01185, audio_tagging_loss=0.008564, over 3053118.16 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:44:01,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3508840.0, ans=0.125 2023-11-28 12:44:18,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3508906.6666666665, ans=0.125 2023-11-28 12:44:19,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.008e+01 9.881e+01 1.066e+02 1.464e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-28 12:44:25,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-28 12:44:45,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2023-11-28 12:44:47,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3509106.6666666665, ans=0.0 2023-11-28 12:44:57,015 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9350, loss[loss=0.06491, simple_loss=0.08443, pruned_loss=0.01328, audio_tagging_loss=0.009413, over 15270.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08922, pruned_loss=0.012, audio_tagging_loss=0.008566, over 3047913.76 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:44:58,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3509173.3333333335, ans=22.5 2023-11-28 12:45:14,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3509240.0, ans=0.125 2023-11-28 12:45:16,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-28 12:45:22,292 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-28 12:45:26,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-28 12:45:38,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-28 12:45:55,588 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9400, loss[loss=0.06102, simple_loss=0.07982, pruned_loss=0.01126, audio_tagging_loss=0.009845, over 16275.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08995, pruned_loss=0.01215, audio_tagging_loss=0.008691, over 3048887.95 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:46:14,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.941e+01 9.559e+01 9.955e+01 1.222e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:46:20,456 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-28 12:46:24,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-28 12:46:25,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-11-28 12:46:36,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3509706.6666666665, ans=0.125 2023-11-28 12:46:37,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3509706.6666666665, ans=0.125 2023-11-28 12:46:39,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3509706.6666666665, ans=0.125 2023-11-28 12:46:52,480 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9450, loss[loss=0.06934, simple_loss=0.08521, pruned_loss=0.01755, audio_tagging_loss=0.009183, over 14472.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09028, pruned_loss=0.01231, audio_tagging_loss=0.008759, over 3055012.11 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:46:55,875 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:47:18,065 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-28 12:47:21,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3509973.3333333335, ans=0.125 2023-11-28 12:47:49,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 12:47:50,095 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9500, loss[loss=0.07746, simple_loss=0.1031, pruned_loss=0.01768, audio_tagging_loss=0.008229, over 14473.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08956, pruned_loss=0.01223, audio_tagging_loss=0.008915, over 3048611.52 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:48:02,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3510240.0, ans=0.1 2023-11-28 12:48:10,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.845e+01 9.672e+01 1.033e+02 1.277e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:48:15,941 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-28 12:48:25,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3510373.3333333335, ans=0.125 2023-11-28 12:48:25,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3510373.3333333335, ans=0.2 2023-11-28 12:48:42,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3510440.0, ans=0.125 2023-11-28 12:48:45,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3510440.0, ans=0.0 2023-11-28 12:48:48,233 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9550, loss[loss=0.06015, simple_loss=0.07601, pruned_loss=0.008564, audio_tagging_loss=0.01358, over 15045.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08927, pruned_loss=0.01215, audio_tagging_loss=0.009056, over 3050121.48 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:48:57,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-28 12:49:07,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3510573.3333333335, ans=0.0 2023-11-28 12:49:13,676 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-28 12:49:14,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-28 12:49:20,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3510640.0, ans=0.125 2023-11-28 12:49:20,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3510640.0, ans=0.2 2023-11-28 12:49:24,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3510706.6666666665, ans=0.125 2023-11-28 12:49:38,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3510773.3333333335, ans=0.015 2023-11-28 12:49:43,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3510773.3333333335, ans=0.0 2023-11-28 12:49:46,375 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9600, loss[loss=0.07697, simple_loss=0.1093, pruned_loss=0.01344, audio_tagging_loss=0.008859, over 16472.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08931, pruned_loss=0.01197, audio_tagging_loss=0.009004, over 3053541.30 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:49:54,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3510840.0, ans=0.0 2023-11-28 12:50:02,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3510906.6666666665, ans=0.125 2023-11-28 12:50:06,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.780e+01 9.691e+01 1.026e+02 1.293e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 12:50:11,858 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-28 12:50:16,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-28 12:50:24,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3511040.0, ans=0.125 2023-11-28 12:50:27,469 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:50:29,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3511040.0, ans=0.125 2023-11-28 12:50:44,464 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9650, loss[loss=0.05849, simple_loss=0.07731, pruned_loss=0.01055, audio_tagging_loss=0.009282, over 15944.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08897, pruned_loss=0.01206, audio_tagging_loss=0.009036, over 3048120.36 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:50:55,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2023-11-28 12:51:02,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3511240.0, ans=0.125 2023-11-28 12:51:07,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3511306.6666666665, ans=0.95 2023-11-28 12:51:09,390 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-28 12:51:25,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=12.0 2023-11-28 12:51:37,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3511440.0, ans=0.0 2023-11-28 12:51:40,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3511440.0, ans=0.1 2023-11-28 12:51:41,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3511506.6666666665, ans=0.0 2023-11-28 12:51:42,567 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9700, loss[loss=0.0416, simple_loss=0.05715, pruned_loss=0.006098, audio_tagging_loss=0.006927, over 14922.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08958, pruned_loss=0.01236, audio_tagging_loss=0.008847, over 3046497.76 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:51:54,885 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:51:55,985 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:51:57,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3511573.3333333335, ans=0.125 2023-11-28 12:52:03,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.945e+01 9.456e+01 1.030e+02 1.271e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 12:52:07,529 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-28 12:52:09,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3511640.0, ans=0.125 2023-11-28 12:52:39,623 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9750, loss[loss=0.08719, simple_loss=0.1209, pruned_loss=0.01915, audio_tagging_loss=0.007603, over 15320.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08974, pruned_loss=0.01236, audio_tagging_loss=0.008708, over 3050300.17 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:52:39,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3511840.0, ans=0.1 2023-11-28 12:52:57,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3511906.6666666665, ans=0.1 2023-11-28 12:53:05,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-28 12:53:05,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-28 12:53:08,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3511973.3333333335, ans=0.025 2023-11-28 12:53:10,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-28 12:53:17,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-28 12:53:37,921 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9800, loss[loss=0.06586, simple_loss=0.09559, pruned_loss=0.01183, audio_tagging_loss=0.006225, over 14640.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08987, pruned_loss=0.01235, audio_tagging_loss=0.008513, over 3042171.72 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:53:40,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3512173.3333333335, ans=0.125 2023-11-28 12:53:50,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3512240.0, ans=0.125 2023-11-28 12:53:58,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.970e+01 9.668e+01 1.037e+02 1.358e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:54:02,864 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-28 12:54:05,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-28 12:54:10,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3512373.3333333335, ans=0.1 2023-11-28 12:54:19,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3512373.3333333335, ans=15.0 2023-11-28 12:54:19,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2023-11-28 12:54:26,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3512440.0, ans=0.0 2023-11-28 12:54:33,062 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:54:35,743 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9850, loss[loss=0.06182, simple_loss=0.08346, pruned_loss=0.01118, audio_tagging_loss=0.008918, over 15391.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08998, pruned_loss=0.01222, audio_tagging_loss=0.008414, over 3044552.94 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:55:01,017 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-28 12:55:01,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3512640.0, ans=0.125 2023-11-28 12:55:31,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3512773.3333333335, ans=0.125 2023-11-28 12:55:33,321 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9900, loss[loss=0.06199, simple_loss=0.07446, pruned_loss=0.01259, audio_tagging_loss=0.01217, over 15439.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08945, pruned_loss=0.0122, audio_tagging_loss=0.008485, over 3041805.42 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:55:42,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3512840.0, ans=0.125 2023-11-28 12:55:55,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.239e+01 9.931e+01 1.065e+02 1.438e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-28 12:55:58,680 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-28 12:56:09,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3513040.0, ans=0.2 2023-11-28 12:56:10,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2023-11-28 12:56:15,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2023-11-28 12:56:19,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3513106.6666666665, ans=0.2 2023-11-28 12:56:28,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=22.5 2023-11-28 12:56:31,207 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9950, loss[loss=0.06954, simple_loss=0.1037, pruned_loss=0.01043, audio_tagging_loss=0.007274, over 14695.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08895, pruned_loss=0.01205, audio_tagging_loss=0.008526, over 3041096.01 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:56:56,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-28 12:57:23,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3513440.0, ans=0.0 2023-11-28 12:57:29,193 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10000, loss[loss=0.06354, simple_loss=0.07769, pruned_loss=0.01175, audio_tagging_loss=0.01294, over 14670.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08878, pruned_loss=0.01197, audio_tagging_loss=0.008557, over 3040695.84 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:57:33,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3513506.6666666665, ans=0.125 2023-11-28 12:57:41,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3513573.3333333335, ans=0.125 2023-11-28 12:57:43,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3513573.3333333335, ans=0.0 2023-11-28 12:57:50,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.894e+01 9.636e+01 1.033e+02 1.186e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 12:57:51,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3513640.0, ans=0.0 2023-11-28 12:57:53,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-28 12:58:03,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2023-11-28 12:58:20,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3513773.3333333335, ans=0.1 2023-11-28 12:58:23,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3513773.3333333335, ans=0.1 2023-11-28 12:58:24,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3513773.3333333335, ans=0.125 2023-11-28 12:58:26,166 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10050, loss[loss=0.06749, simple_loss=0.09083, pruned_loss=0.01225, audio_tagging_loss=0.00983, over 15105.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08852, pruned_loss=0.01195, audio_tagging_loss=0.008608, over 3037450.10 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:58:26,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3513840.0, ans=0.125 2023-11-28 12:58:49,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3513973.3333333335, ans=0.125 2023-11-28 12:58:50,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3513973.3333333335, ans=0.125 2023-11-28 12:58:51,644 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-28 12:58:58,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3513973.3333333335, ans=0.125 2023-11-28 12:59:10,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3514106.6666666665, ans=0.0 2023-11-28 12:59:12,113 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:59:22,920 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10100, loss[loss=0.05792, simple_loss=0.0856, pruned_loss=0.008835, audio_tagging_loss=0.006283, over 15999.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08917, pruned_loss=0.01217, audio_tagging_loss=0.008612, over 3045895.85 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:59:25,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3514173.3333333335, ans=0.015 2023-11-28 12:59:46,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.720e+01 9.623e+01 1.026e+02 1.280e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 12:59:49,341 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-28 12:59:56,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3514306.6666666665, ans=0.125 2023-11-28 13:00:14,178 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:21,328 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10150, loss[loss=0.08646, simple_loss=0.1178, pruned_loss=0.019, audio_tagging_loss=0.008553, over 15243.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08958, pruned_loss=0.01226, audio_tagging_loss=0.008599, over 3045396.80 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:00:38,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3514573.3333333335, ans=0.0 2023-11-28 13:00:40,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3514573.3333333335, ans=0.125 2023-11-28 13:00:46,679 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-28 13:00:48,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3514640.0, ans=0.125 2023-11-28 13:00:52,486 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:58,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3514706.6666666665, ans=0.02 2023-11-28 13:01:16,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3514773.3333333335, ans=0.0 2023-11-28 13:01:19,638 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10200, loss[loss=0.06578, simple_loss=0.09132, pruned_loss=0.01304, audio_tagging_loss=0.007078, over 15880.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08927, pruned_loss=0.01211, audio_tagging_loss=0.008701, over 3042152.14 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:01:28,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3514840.0, ans=0.02 2023-11-28 13:01:34,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=22.5 2023-11-28 13:01:41,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.683e+01 9.423e+01 1.021e+02 1.647e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 13:01:44,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-28 13:01:45,677 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:01:47,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3514973.3333333335, ans=0.05 2023-11-28 13:02:11,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3515106.6666666665, ans=0.1 2023-11-28 13:02:16,829 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10250, loss[loss=0.07481, simple_loss=0.09935, pruned_loss=0.0151, audio_tagging_loss=0.01003, over 15223.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09036, pruned_loss=0.01233, audio_tagging_loss=0.00873, over 3045986.00 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:02:18,235 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:02:27,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3515240.0, ans=0.09899494936611666 2023-11-28 13:02:43,375 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-28 13:02:56,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3515373.3333333335, ans=0.125 2023-11-28 13:03:03,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3515440.0, ans=0.0 2023-11-28 13:03:10,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3515440.0, ans=0.125 2023-11-28 13:03:14,509 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10300, loss[loss=0.06187, simple_loss=0.0836, pruned_loss=0.009815, audio_tagging_loss=0.01025, over 15798.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08963, pruned_loss=0.0123, audio_tagging_loss=0.008829, over 3049099.44 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:03:27,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3515573.3333333335, ans=10.0 2023-11-28 13:03:36,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.855e+01 9.649e+01 1.050e+02 1.403e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 13:03:40,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-28 13:03:55,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3515706.6666666665, ans=0.125 2023-11-28 13:04:07,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3515773.3333333335, ans=0.125 2023-11-28 13:04:10,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3515773.3333333335, ans=0.0 2023-11-28 13:04:12,931 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10350, loss[loss=0.05983, simple_loss=0.07999, pruned_loss=0.01081, audio_tagging_loss=0.009025, over 15318.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08996, pruned_loss=0.01234, audio_tagging_loss=0.008948, over 3050918.29 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:04:20,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3515840.0, ans=0.125 2023-11-28 13:04:27,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3515906.6666666665, ans=0.125 2023-11-28 13:04:35,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3515973.3333333335, ans=0.125 2023-11-28 13:04:37,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-28 13:04:37,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3515973.3333333335, ans=0.125 2023-11-28 13:04:44,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2023-11-28 13:04:51,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3516040.0, ans=0.125 2023-11-28 13:05:07,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-11-28 13:05:10,550 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10400, loss[loss=0.08029, simple_loss=0.1007, pruned_loss=0.02224, audio_tagging_loss=0.007699, over 14379.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08995, pruned_loss=0.01235, audio_tagging_loss=0.009022, over 3048276.87 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:05:29,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3516240.0, ans=0.1 2023-11-28 13:05:32,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.002e+01 9.653e+01 1.031e+02 1.825e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:05:36,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-28 13:05:36,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3516306.6666666665, ans=0.125 2023-11-28 13:05:40,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3516306.6666666665, ans=0.125 2023-11-28 13:05:47,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3516373.3333333335, ans=0.125 2023-11-28 13:06:05,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3516440.0, ans=0.0 2023-11-28 13:06:08,061 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10450, loss[loss=0.05474, simple_loss=0.07342, pruned_loss=0.00943, audio_tagging_loss=0.008601, over 15452.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08928, pruned_loss=0.01228, audio_tagging_loss=0.008994, over 3045057.77 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:06:33,664 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-28 13:06:37,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3516640.0, ans=0.125 2023-11-28 13:06:39,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3516640.0, ans=0.1 2023-11-28 13:06:40,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-28 13:06:52,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-28 13:06:52,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-28 13:07:00,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-28 13:07:05,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3516840.0, ans=0.0 2023-11-28 13:07:06,805 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10500, loss[loss=0.05382, simple_loss=0.07057, pruned_loss=0.007663, audio_tagging_loss=0.01087, over 15159.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0893, pruned_loss=0.01226, audio_tagging_loss=0.008861, over 3051879.39 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:07:07,023 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:07:09,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3516840.0, ans=0.125 2023-11-28 13:07:10,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3516840.0, ans=0.04949747468305833 2023-11-28 13:07:11,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3516840.0, ans=0.125 2023-11-28 13:07:17,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-11-28 13:07:28,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.731e+01 9.359e+01 1.002e+02 1.371e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 13:07:31,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-28 13:07:39,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3517040.0, ans=0.0 2023-11-28 13:08:04,089 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10550, loss[loss=0.05852, simple_loss=0.08826, pruned_loss=0.007825, audio_tagging_loss=0.006562, over 15601.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08865, pruned_loss=0.01202, audio_tagging_loss=0.008799, over 3054711.49 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:08:11,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3517173.3333333335, ans=0.07 2023-11-28 13:08:22,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3517240.0, ans=0.2 2023-11-28 13:08:28,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-28 13:08:45,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-28 13:08:59,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3517440.0, ans=0.125 2023-11-28 13:09:01,977 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10600, loss[loss=0.06216, simple_loss=0.08952, pruned_loss=0.01059, audio_tagging_loss=0.006809, over 15157.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08927, pruned_loss=0.01224, audio_tagging_loss=0.008707, over 3053714.45 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:09:02,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3517506.6666666665, ans=0.0 2023-11-28 13:09:03,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3517506.6666666665, ans=0.125 2023-11-28 13:09:11,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3517506.6666666665, ans=0.2 2023-11-28 13:09:12,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3517573.3333333335, ans=0.0 2023-11-28 13:09:17,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3517573.3333333335, ans=0.0 2023-11-28 13:09:24,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.917e+01 9.595e+01 1.067e+02 1.545e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:09:27,970 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-28 13:09:46,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3517706.6666666665, ans=0.125 2023-11-28 13:09:49,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2023-11-28 13:09:50,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-28 13:09:55,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3517773.3333333335, ans=0.125 2023-11-28 13:10:00,406 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10650, loss[loss=0.06631, simple_loss=0.09288, pruned_loss=0.01124, audio_tagging_loss=0.008632, over 15798.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08962, pruned_loss=0.01218, audio_tagging_loss=0.008562, over 3055808.45 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:10:03,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3517840.0, ans=0.125 2023-11-28 13:10:11,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3517906.6666666665, ans=0.07 2023-11-28 13:10:25,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-28 13:10:37,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3518040.0, ans=0.125 2023-11-28 13:10:57,981 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10700, loss[loss=0.05668, simple_loss=0.07388, pruned_loss=0.008922, audio_tagging_loss=0.01082, over 16332.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08952, pruned_loss=0.01218, audio_tagging_loss=0.008525, over 3050954.96 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:11:03,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3518173.3333333335, ans=0.125 2023-11-28 13:11:08,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2023-11-28 13:11:18,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3518240.0, ans=0.0 2023-11-28 13:11:19,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.935e+01 9.512e+01 1.031e+02 1.313e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 13:11:22,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-28 13:11:41,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3518373.3333333335, ans=0.2 2023-11-28 13:11:43,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3518440.0, ans=0.035 2023-11-28 13:11:55,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.90 vs. limit=6.0 2023-11-28 13:11:55,921 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10750, loss[loss=0.06073, simple_loss=0.08857, pruned_loss=0.009631, audio_tagging_loss=0.006813, over 15657.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09005, pruned_loss=0.0122, audio_tagging_loss=0.008527, over 3050486.38 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:11:56,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3518506.6666666665, ans=0.0 2023-11-28 13:12:09,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518573.3333333335, ans=0.1 2023-11-28 13:12:21,145 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-28 13:12:21,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3518640.0, ans=0.0 2023-11-28 13:12:53,901 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10800, loss[loss=0.07489, simple_loss=0.1061, pruned_loss=0.01403, audio_tagging_loss=0.007824, over 16510.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08989, pruned_loss=0.01219, audio_tagging_loss=0.008494, over 3056680.51 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:13:15,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.825e+01 9.488e+01 1.009e+02 1.262e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:13:17,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518973.3333333335, ans=0.1 2023-11-28 13:13:20,008 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-28 13:13:37,198 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:13:51,770 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10850, loss[loss=0.05872, simple_loss=0.08569, pruned_loss=0.008693, audio_tagging_loss=0.007187, over 14861.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08916, pruned_loss=0.01201, audio_tagging_loss=0.008538, over 3046763.55 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:17,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-28 13:14:50,119 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10900, loss[loss=0.05094, simple_loss=0.06278, pruned_loss=0.009385, audio_tagging_loss=0.01016, over 14816.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08919, pruned_loss=0.01207, audio_tagging_loss=0.008609, over 3042788.55 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:51,254 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:15:03,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3519573.3333333335, ans=0.125 2023-11-28 13:15:09,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=22.5 2023-11-28 13:15:11,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.813e+01 9.594e+01 1.027e+02 1.260e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:15:15,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-28 13:15:33,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2023-11-28 13:15:44,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3519773.3333333335, ans=0.0 2023-11-28 13:15:47,462 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10950, loss[loss=0.06439, simple_loss=0.08496, pruned_loss=0.01345, audio_tagging_loss=0.008459, over 15395.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08843, pruned_loss=0.01202, audio_tagging_loss=0.008722, over 3040757.85 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:16:06,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3519906.6666666665, ans=15.0 2023-11-28 13:16:08,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-28 13:16:10,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3519973.3333333335, ans=0.2 2023-11-28 13:16:12,781 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-28 13:16:23,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3520040.0, ans=0.125 2023-11-28 13:16:31,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3520040.0, ans=0.125 2023-11-28 13:16:37,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3520106.6666666665, ans=0.1 2023-11-28 13:16:41,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3520106.6666666665, ans=0.125 2023-11-28 13:16:47,527 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11000, loss[loss=0.08743, simple_loss=0.116, pruned_loss=0.02094, audio_tagging_loss=0.008501, over 15667.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08952, pruned_loss=0.01229, audio_tagging_loss=0.008736, over 3045006.26 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:16:54,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3520173.3333333335, ans=0.125 2023-11-28 13:17:00,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3520240.0, ans=0.125 2023-11-28 13:17:01,793 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:17:09,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.312e+01 9.741e+01 1.066e+02 1.982e+02, threshold=1.948e+02, percent-clipped=1.0 2023-11-28 13:17:12,891 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-28 13:17:17,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3520306.6666666665, ans=0.07 2023-11-28 13:17:27,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3520373.3333333335, ans=0.125 2023-11-28 13:17:29,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3520373.3333333335, ans=0.125 2023-11-28 13:17:29,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3520373.3333333335, ans=0.05 2023-11-28 13:17:44,882 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11050, loss[loss=0.07276, simple_loss=0.08381, pruned_loss=0.02012, audio_tagging_loss=0.01073, over 14319.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0891, pruned_loss=0.01221, audio_tagging_loss=0.008927, over 3053162.18 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:18:10,185 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-28 13:18:12,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3520640.0, ans=0.125 2023-11-28 13:18:19,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3520706.6666666665, ans=0.1 2023-11-28 13:18:26,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3520706.6666666665, ans=0.2 2023-11-28 13:18:31,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3520773.3333333335, ans=0.125 2023-11-28 13:18:31,758 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:18:32,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520773.3333333335, ans=0.0 2023-11-28 13:18:36,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3520773.3333333335, ans=0.0 2023-11-28 13:18:38,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3520773.3333333335, ans=0.5 2023-11-28 13:18:41,364 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11100, loss[loss=0.07169, simple_loss=0.09778, pruned_loss=0.01408, audio_tagging_loss=0.00872, over 14858.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08946, pruned_loss=0.01222, audio_tagging_loss=0.008937, over 3050678.56 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:19:00,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-28 13:19:04,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.992e+01 9.660e+01 1.067e+02 1.331e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 13:19:06,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-28 13:19:15,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3521040.0, ans=0.09899494936611666 2023-11-28 13:19:39,154 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11150, loss[loss=0.07151, simple_loss=0.08801, pruned_loss=0.01651, audio_tagging_loss=0.01099, over 16320.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09011, pruned_loss=0.01239, audio_tagging_loss=0.009034, over 3050838.40 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:19:48,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3521173.3333333335, ans=0.0 2023-11-28 13:20:04,785 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-28 13:20:23,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3521373.3333333335, ans=0.0 2023-11-28 13:20:28,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521440.0, ans=0.1 2023-11-28 13:20:28,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3521440.0, ans=0.0 2023-11-28 13:20:32,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:33,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:33,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:36,832 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11200, loss[loss=0.07764, simple_loss=0.09783, pruned_loss=0.02009, audio_tagging_loss=0.008634, over 16167.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09011, pruned_loss=0.01224, audio_tagging_loss=0.009089, over 3051683.59 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:21:00,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.925e+01 9.638e+01 1.050e+02 1.394e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 13:21:03,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-28 13:21:26,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.68 vs. limit=10.0 2023-11-28 13:21:35,061 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11250, loss[loss=0.06232, simple_loss=0.07828, pruned_loss=0.01276, audio_tagging_loss=0.01042, over 15860.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08897, pruned_loss=0.01209, audio_tagging_loss=0.00908, over 3051626.33 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:21:42,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3521840.0, ans=0.125 2023-11-28 13:21:43,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521840.0, ans=0.1 2023-11-28 13:21:45,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521840.0, ans=0.1 2023-11-28 13:21:47,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-28 13:21:58,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3521973.3333333335, ans=0.125 2023-11-28 13:22:00,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-28 13:22:06,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-28 13:22:11,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3522040.0, ans=0.125 2023-11-28 13:22:23,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2023-11-28 13:22:26,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3522106.6666666665, ans=0.5 2023-11-28 13:22:33,046 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11300, loss[loss=0.06405, simple_loss=0.09129, pruned_loss=0.01157, audio_tagging_loss=0.006837, over 14486.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08962, pruned_loss=0.01235, audio_tagging_loss=0.008822, over 3044281.13 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:22:41,098 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:22:48,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3522240.0, ans=0.1 2023-11-28 13:22:57,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.854e+01 9.489e+01 1.005e+02 1.713e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:22:57,985 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-28 13:23:09,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3522373.3333333335, ans=0.125 2023-11-28 13:23:30,097 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11350, loss[loss=0.06796, simple_loss=0.09377, pruned_loss=0.01343, audio_tagging_loss=0.00765, over 15642.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08997, pruned_loss=0.01248, audio_tagging_loss=0.008664, over 3042089.91 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:23:37,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3522506.6666666665, ans=0.0 2023-11-28 13:23:40,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3522573.3333333335, ans=0.0 2023-11-28 13:23:46,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3522573.3333333335, ans=0.0 2023-11-28 13:23:50,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-28 13:23:51,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-28 13:23:56,587 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-28 13:24:09,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3522706.6666666665, ans=0.0 2023-11-28 13:24:12,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3522706.6666666665, ans=0.125 2023-11-28 13:24:16,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3522773.3333333335, ans=0.125 2023-11-28 13:24:28,359 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11400, loss[loss=0.0713, simple_loss=0.1053, pruned_loss=0.01406, audio_tagging_loss=0.00459, over 14406.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09061, pruned_loss=0.01253, audio_tagging_loss=0.008623, over 3039097.12 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:24:43,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=10.0 2023-11-28 13:24:54,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 9.209e+01 9.932e+01 1.057e+02 3.089e+02, threshold=1.986e+02, percent-clipped=1.0 2023-11-28 13:24:54,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-28 13:25:09,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3523040.0, ans=0.125 2023-11-28 13:25:27,088 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11450, loss[loss=0.04282, simple_loss=0.05284, pruned_loss=0.005312, audio_tagging_loss=0.01109, over 15730.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09011, pruned_loss=0.01238, audio_tagging_loss=0.008705, over 3035199.05 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:25:29,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3523173.3333333335, ans=0.125 2023-11-28 13:25:51,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-28 13:25:54,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3523306.6666666665, ans=0.125 2023-11-28 13:25:58,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=22.5 2023-11-28 13:26:04,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3523373.3333333335, ans=0.035 2023-11-28 13:26:06,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3523373.3333333335, ans=0.125 2023-11-28 13:26:21,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3523440.0, ans=0.025 2023-11-28 13:26:24,137 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11500, loss[loss=0.05256, simple_loss=0.06927, pruned_loss=0.009098, audio_tagging_loss=0.008823, over 15334.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09009, pruned_loss=0.01238, audio_tagging_loss=0.008698, over 3040691.89 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:26:37,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3523573.3333333335, ans=0.125 2023-11-28 13:26:37,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3523573.3333333335, ans=0.0 2023-11-28 13:26:49,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-28 13:26:50,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.727e+01 9.422e+01 1.009e+02 1.518e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 13:26:50,405 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-28 13:27:02,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3523706.6666666665, ans=0.125 2023-11-28 13:27:22,071 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11550, loss[loss=0.05525, simple_loss=0.07481, pruned_loss=0.007699, audio_tagging_loss=0.01015, over 15141.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09009, pruned_loss=0.01233, audio_tagging_loss=0.008628, over 3048567.26 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:27:22,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3523840.0, ans=0.125 2023-11-28 13:27:26,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3523840.0, ans=0.1 2023-11-28 13:27:47,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-28 13:27:55,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3523973.3333333335, ans=0.1 2023-11-28 13:28:03,202 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:28:08,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3524106.6666666665, ans=0.0 2023-11-28 13:28:18,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3524106.6666666665, ans=0.125 2023-11-28 13:28:21,093 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11600, loss[loss=0.08001, simple_loss=0.1156, pruned_loss=0.01556, audio_tagging_loss=0.006652, over 15449.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0907, pruned_loss=0.01245, audio_tagging_loss=0.00857, over 3050193.54 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:28:35,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524240.0, ans=0.1 2023-11-28 13:28:36,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3524240.0, ans=0.0 2023-11-28 13:28:44,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.04 vs. limit=10.0 2023-11-28 13:28:45,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 9.016e+01 9.615e+01 1.039e+02 1.434e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 13:28:46,028 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-28 13:28:49,551 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:29:03,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3524373.3333333335, ans=0.0 2023-11-28 13:29:09,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-28 13:29:10,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3524440.0, ans=0.0 2023-11-28 13:29:18,255 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11650, loss[loss=0.06744, simple_loss=0.0896, pruned_loss=0.01425, audio_tagging_loss=0.008389, over 15017.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09058, pruned_loss=0.0125, audio_tagging_loss=0.008567, over 3039451.64 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:29:20,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3524506.6666666665, ans=0.125 2023-11-28 13:29:22,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.77 vs. limit=15.0 2023-11-28 13:29:28,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3524573.3333333335, ans=0.125 2023-11-28 13:29:38,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3524573.3333333335, ans=0.125 2023-11-28 13:29:43,048 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-28 13:29:46,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3524640.0, ans=0.1 2023-11-28 13:30:15,758 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11700, loss[loss=0.07597, simple_loss=0.1063, pruned_loss=0.01597, audio_tagging_loss=0.006862, over 15683.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08977, pruned_loss=0.0123, audio_tagging_loss=0.008625, over 3041734.38 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:30:26,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3524906.6666666665, ans=0.125 2023-11-28 13:30:32,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3524906.6666666665, ans=0.125 2023-11-28 13:30:41,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.706e+01 9.225e+01 1.001e+02 1.364e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 13:30:41,789 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-28 13:30:47,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-28 13:30:57,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.06 vs. limit=10.0 2023-11-28 13:31:12,751 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11750, loss[loss=0.07357, simple_loss=0.09876, pruned_loss=0.0138, audio_tagging_loss=0.01039, over 14521.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09055, pruned_loss=0.0125, audio_tagging_loss=0.008649, over 3041207.78 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:31:14,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3525173.3333333335, ans=0.1 2023-11-28 13:31:21,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3525173.3333333335, ans=0.1 2023-11-28 13:31:24,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3525240.0, ans=0.1 2023-11-28 13:31:38,561 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-28 13:31:40,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-11-28 13:31:47,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3525373.3333333335, ans=0.125 2023-11-28 13:32:01,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2023-11-28 13:32:11,777 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11800, loss[loss=0.04747, simple_loss=0.06241, pruned_loss=0.007449, audio_tagging_loss=0.008818, over 16574.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09087, pruned_loss=0.01245, audio_tagging_loss=0.008639, over 3046157.93 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:32:17,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2023-11-28 13:32:36,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 9.052e+01 9.553e+01 1.035e+02 2.670e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-28 13:32:36,508 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-28 13:32:41,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 13:32:44,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2023-11-28 13:33:07,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3525840.0, ans=0.125 2023-11-28 13:33:09,422 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11850, loss[loss=0.06864, simple_loss=0.09049, pruned_loss=0.01482, audio_tagging_loss=0.008574, over 15501.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09052, pruned_loss=0.01273, audio_tagging_loss=0.008727, over 3047121.79 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:33:10,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3525840.0, ans=0.0 2023-11-28 13:33:18,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3525840.0, ans=0.125 2023-11-28 13:33:35,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-28 13:33:47,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3526040.0, ans=0.0 2023-11-28 13:34:05,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3526173.3333333335, ans=0.125 2023-11-28 13:34:06,363 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11900, loss[loss=0.06718, simple_loss=0.09495, pruned_loss=0.01127, audio_tagging_loss=0.008439, over 15973.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09038, pruned_loss=0.01269, audio_tagging_loss=0.008837, over 3047175.12 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:34:09,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3526173.3333333335, ans=0.1 2023-11-28 13:34:16,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2023-11-28 13:34:29,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-28 13:34:31,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.911e+01 9.791e+01 1.051e+02 1.188e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-28 13:34:32,083 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-28 13:34:34,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-11-28 13:34:37,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3526306.6666666665, ans=0.05 2023-11-28 13:35:05,018 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11950, loss[loss=0.04331, simple_loss=0.06034, pruned_loss=0.003639, audio_tagging_loss=0.009494, over 14492.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08905, pruned_loss=0.01233, audio_tagging_loss=0.009058, over 3040990.85 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:35:15,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-28 13:35:24,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3526573.3333333335, ans=0.0 2023-11-28 13:35:29,918 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-28 13:35:55,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-28 13:36:02,068 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 12000, loss[loss=0.05651, simple_loss=0.07081, pruned_loss=0.009265, audio_tagging_loss=0.01184, over 15012.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08872, pruned_loss=0.01222, audio_tagging_loss=0.009218, over 3042648.45 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:36:02,069 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 13:36:20,628 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.4814, 2.8798, 2.5959, 3.1657, 2.8745, 2.8578, 2.9058, 2.8361], device='cuda:3') 2023-11-28 13:36:37,269 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05811, simple_loss=0.05058, pruned_loss=0.005337, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-28 13:36:37,270 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 13:36:40,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-28 13:36:51,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3526906.6666666665, ans=0.95 2023-11-28 13:37:00,784 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-28 13:37:01,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 8.972e+01 9.530e+01 1.015e+02 1.256e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 13:37:21,555 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 0, loss[loss=0.07317, simple_loss=0.08943, pruned_loss=0.01011, audio_tagging_loss=0.01835, over 15663.00 frames. ], tot_loss[loss=0.07317, simple_loss=0.08943, pruned_loss=0.01011, audio_tagging_loss=0.01835, over 15663.00 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:37:21,556 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 13:37:50,972 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3411, 5.0210, 4.6676, 5.1816], device='cuda:3') 2023-11-28 13:37:56,009 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05764, simple_loss=0.05062, pruned_loss=0.005372, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-28 13:37:56,009 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 13:38:20,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-28 13:38:23,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-28 13:38:45,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3527280.0, ans=0.2 2023-11-28 13:38:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527280.0, ans=0.125 2023-11-28 13:38:49,228 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-28 13:38:53,504 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 50, loss[loss=0.07879, simple_loss=0.1018, pruned_loss=0.0125, audio_tagging_loss=0.0154, over 16191.00 frames. ], tot_loss[loss=0.0729, simple_loss=0.08844, pruned_loss=0.01198, audio_tagging_loss=0.01669, over 690607.70 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:38:56,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3527346.6666666665, ans=0.0 2023-11-28 13:39:06,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-28 13:39:10,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3527413.3333333335, ans=0.07 2023-11-28 13:39:19,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3527480.0, ans=0.125 2023-11-28 13:39:35,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3527546.6666666665, ans=0.125 2023-11-28 13:39:47,119 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-28 13:39:49,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.341e+01 9.943e+01 1.065e+02 1.140e+02 1.453e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-28 13:39:51,483 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 100, loss[loss=0.07402, simple_loss=0.09588, pruned_loss=0.01117, audio_tagging_loss=0.01491, over 15641.00 frames. ], tot_loss[loss=0.07448, simple_loss=0.09174, pruned_loss=0.01278, audio_tagging_loss=0.01582, over 1216187.16 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:40:01,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3527680.0, ans=15.0 2023-11-28 13:40:08,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:11,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3527746.6666666665, ans=0.0 2023-11-28 13:40:22,263 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:40:24,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3527813.3333333335, ans=0.0 2023-11-28 13:40:25,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3527880.0, ans=0.0 2023-11-28 13:40:45,440 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-28 13:40:49,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-28 13:40:50,297 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 150, loss[loss=0.07149, simple_loss=0.1013, pruned_loss=0.01109, audio_tagging_loss=0.009736, over 15324.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09184, pruned_loss=0.01264, audio_tagging_loss=0.01414, over 1622636.78 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:41:18,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-28 13:41:19,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-28 13:41:23,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-28 13:41:26,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3528213.3333333335, ans=0.2 2023-11-28 13:41:43,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-28 13:41:46,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.992e+01 9.963e+01 1.064e+02 1.457e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-28 13:41:46,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3528280.0, ans=0.0 2023-11-28 13:41:48,764 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 200, loss[loss=0.06172, simple_loss=0.08522, pruned_loss=0.01065, audio_tagging_loss=0.008456, over 15054.00 frames. ], tot_loss[loss=0.07024, simple_loss=0.09073, pruned_loss=0.01234, audio_tagging_loss=0.01254, over 1940824.38 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:42:00,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-28 13:42:04,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-28 13:42:13,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=22.5 2023-11-28 13:42:41,605 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-28 13:42:46,518 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 250, loss[loss=0.06411, simple_loss=0.08609, pruned_loss=0.01002, audio_tagging_loss=0.01105, over 14874.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.08985, pruned_loss=0.01207, audio_tagging_loss=0.01138, over 2190149.96 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:42:51,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3528680.0, ans=0.0 2023-11-28 13:42:55,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3528680.0, ans=0.2 2023-11-28 13:43:39,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-28 13:43:42,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.918e+01 9.810e+01 1.066e+02 1.328e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-28 13:43:44,757 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 300, loss[loss=0.07717, simple_loss=0.1053, pruned_loss=0.01811, audio_tagging_loss=0.006426, over 15162.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09083, pruned_loss=0.01249, audio_tagging_loss=0.01065, over 2386298.69 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:43:54,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3529080.0, ans=0.5 2023-11-28 13:43:58,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3529080.0, ans=0.125 2023-11-28 13:44:03,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2023-11-28 13:44:03,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=8.0 2023-11-28 13:44:04,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-28 13:44:37,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-28 13:44:42,416 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 350, loss[loss=0.06279, simple_loss=0.09055, pruned_loss=0.008685, audio_tagging_loss=0.008833, over 16322.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09146, pruned_loss=0.01246, audio_tagging_loss=0.009965, over 2539757.01 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:44:51,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3529346.6666666665, ans=0.04949747468305833 2023-11-28 13:44:58,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3529413.3333333335, ans=0.09899494936611666 2023-11-28 13:45:08,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3529480.0, ans=15.0 2023-11-28 13:45:09,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3529480.0, ans=0.125 2023-11-28 13:45:28,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3529613.3333333335, ans=0.2 2023-11-28 13:45:30,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-28 13:45:35,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-28 13:45:38,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 9.086e+01 9.699e+01 1.038e+02 1.395e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 13:45:40,891 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 400, loss[loss=0.05592, simple_loss=0.06875, pruned_loss=0.007962, audio_tagging_loss=0.01358, over 14773.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09052, pruned_loss=0.01212, audio_tagging_loss=0.009682, over 2649758.38 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:45:45,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3529680.0, ans=0.125 2023-11-28 13:46:12,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3529813.3333333335, ans=0.025 2023-11-28 13:46:15,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3529880.0, ans=0.1 2023-11-28 13:46:23,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-28 13:46:34,066 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-28 13:46:35,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2023-11-28 13:46:37,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3530013.3333333335, ans=0.1 2023-11-28 13:46:38,927 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 450, loss[loss=0.06449, simple_loss=0.09842, pruned_loss=0.008689, audio_tagging_loss=0.00659, over 15455.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09042, pruned_loss=0.01215, audio_tagging_loss=0.009401, over 2739463.99 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:46:58,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3530080.0, ans=0.125 2023-11-28 13:47:27,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-28 13:47:32,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-28 13:47:35,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.821e+01 9.442e+01 9.964e+01 1.327e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 13:47:36,639 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 500, loss[loss=0.07246, simple_loss=0.1019, pruned_loss=0.01314, audio_tagging_loss=0.008359, over 14549.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09039, pruned_loss=0.01214, audio_tagging_loss=0.009186, over 2808126.02 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:47:38,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3530346.6666666665, ans=0.0 2023-11-28 13:47:42,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3530346.6666666665, ans=0.5 2023-11-28 13:47:57,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3530413.3333333335, ans=0.0 2023-11-28 13:48:10,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3530546.6666666665, ans=0.0 2023-11-28 13:48:29,485 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-28 13:48:30,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=22.5 2023-11-28 13:48:30,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3530613.3333333335, ans=0.125 2023-11-28 13:48:32,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3530613.3333333335, ans=0.0 2023-11-28 13:48:33,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3530680.0, ans=0.5 2023-11-28 13:48:34,705 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 550, loss[loss=0.06444, simple_loss=0.08419, pruned_loss=0.01356, audio_tagging_loss=0.008791, over 15095.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08937, pruned_loss=0.012, audio_tagging_loss=0.009076, over 2855243.56 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:48:45,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3530746.6666666665, ans=0.125 2023-11-28 13:48:57,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3530813.3333333335, ans=0.125 2023-11-28 13:49:23,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3530946.6666666665, ans=0.05 2023-11-28 13:49:28,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-28 13:49:30,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3530946.6666666665, ans=0.1 2023-11-28 13:49:31,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.881e+01 9.298e+01 9.934e+01 2.506e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-28 13:49:33,538 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 600, loss[loss=0.07101, simple_loss=0.1017, pruned_loss=0.01174, audio_tagging_loss=0.008389, over 16097.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08863, pruned_loss=0.01194, audio_tagging_loss=0.009089, over 2886403.47 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:49:50,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3531080.0, ans=0.125 2023-11-28 13:49:56,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-28 13:49:58,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3531146.6666666665, ans=0.0 2023-11-28 13:50:04,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531146.6666666665, ans=0.0 2023-11-28 13:50:12,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3531213.3333333335, ans=0.1 2023-11-28 13:50:18,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3531213.3333333335, ans=0.125 2023-11-28 13:50:27,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-28 13:50:31,544 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 650, loss[loss=0.07434, simple_loss=0.1025, pruned_loss=0.01479, audio_tagging_loss=0.008308, over 16318.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08861, pruned_loss=0.01202, audio_tagging_loss=0.009078, over 2925508.46 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:50:33,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531346.6666666665, ans=0.0 2023-11-28 13:50:36,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:40,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:57,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3531480.0, ans=0.07 2023-11-28 13:51:24,491 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-28 13:51:24,690 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:51:25,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3531613.3333333335, ans=0.125 2023-11-28 13:51:26,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3531613.3333333335, ans=0.125 2023-11-28 13:51:27,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.012e+01 9.762e+01 1.029e+02 1.844e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 13:51:28,836 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 700, loss[loss=0.05139, simple_loss=0.06748, pruned_loss=0.007145, audio_tagging_loss=0.01051, over 14697.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08842, pruned_loss=0.01194, audio_tagging_loss=0.009011, over 2950011.08 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:51:38,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531680.0, ans=0.1 2023-11-28 13:51:40,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3531746.6666666665, ans=0.125 2023-11-28 13:51:48,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3531746.6666666665, ans=0.1 2023-11-28 13:51:49,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3531746.6666666665, ans=0.125 2023-11-28 13:52:06,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531880.0, ans=0.1 2023-11-28 13:52:10,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3531880.0, ans=0.125 2023-11-28 13:52:22,761 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-28 13:52:28,095 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 750, loss[loss=0.06841, simple_loss=0.09866, pruned_loss=0.008183, audio_tagging_loss=0.0109, over 15712.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08894, pruned_loss=0.01192, audio_tagging_loss=0.008925, over 2973378.75 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:52:28,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-28 13:52:42,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3532080.0, ans=0.125 2023-11-28 13:52:49,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3532080.0, ans=0.125 2023-11-28 13:52:49,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3532080.0, ans=0.125 2023-11-28 13:53:00,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3532146.6666666665, ans=0.0 2023-11-28 13:53:22,426 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-28 13:53:25,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.782e+01 9.191e+01 9.653e+01 1.030e+02 1.250e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:53:26,932 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 800, loss[loss=0.06418, simple_loss=0.08642, pruned_loss=0.00905, audio_tagging_loss=0.01192, over 16531.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08957, pruned_loss=0.01209, audio_tagging_loss=0.008894, over 2996011.99 frames. ], batch size: 65, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:53:28,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3532346.6666666665, ans=0.0 2023-11-28 13:53:33,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=22.5 2023-11-28 13:53:37,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-11-28 13:53:46,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3532413.3333333335, ans=0.125 2023-11-28 13:54:00,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532546.6666666665, ans=0.1 2023-11-28 13:54:12,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:16,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:17,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-28 13:54:20,017 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-28 13:54:21,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532613.3333333335, ans=0.1 2023-11-28 13:54:24,379 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 850, loss[loss=0.05303, simple_loss=0.07124, pruned_loss=0.007425, audio_tagging_loss=0.009984, over 14735.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09007, pruned_loss=0.01201, audio_tagging_loss=0.008844, over 3013709.55 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:54:31,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3532680.0, ans=0.125 2023-11-28 13:54:34,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3532746.6666666665, ans=0.0 2023-11-28 13:54:41,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-28 13:55:17,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-28 13:55:22,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.821e+01 9.434e+01 1.007e+02 1.194e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 13:55:22,320 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 900, loss[loss=0.07817, simple_loss=0.1148, pruned_loss=0.01256, audio_tagging_loss=0.008215, over 16317.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09005, pruned_loss=0.0121, audio_tagging_loss=0.008976, over 3015898.76 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:55:46,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-11-28 13:55:46,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-11-28 13:55:50,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3533146.6666666665, ans=0.125 2023-11-28 13:55:55,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3533146.6666666665, ans=0.125 2023-11-28 13:56:14,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3533280.0, ans=0.125 2023-11-28 13:56:16,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-28 13:56:21,539 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 950, loss[loss=0.07829, simple_loss=0.1091, pruned_loss=0.0151, audio_tagging_loss=0.008634, over 15015.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09024, pruned_loss=0.01205, audio_tagging_loss=0.008845, over 3025965.60 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:56:27,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-11-28 13:56:29,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-28 13:56:32,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3533413.3333333335, ans=0.09899494936611666 2023-11-28 13:56:35,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3533413.3333333335, ans=0.2 2023-11-28 13:56:43,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3533480.0, ans=0.1 2023-11-28 13:56:51,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2023-11-28 13:56:56,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3533546.6666666665, ans=0.2 2023-11-28 13:56:56,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3533546.6666666665, ans=0.2 2023-11-28 13:57:14,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-28 13:57:18,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.954e+01 9.513e+01 1.020e+02 1.278e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 13:57:18,919 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1000, loss[loss=0.08235, simple_loss=0.1144, pruned_loss=0.01722, audio_tagging_loss=0.007941, over 16202.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08954, pruned_loss=0.01191, audio_tagging_loss=0.008788, over 3030779.11 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:57:19,145 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:57:24,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-28 13:57:37,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3533746.6666666665, ans=0.0 2023-11-28 13:57:45,889 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:57:50,421 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:57:54,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3533880.0, ans=0.1 2023-11-28 13:58:11,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-28 13:58:13,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3533946.6666666665, ans=0.0 2023-11-28 13:58:15,918 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1050, loss[loss=0.06178, simple_loss=0.08728, pruned_loss=0.008797, audio_tagging_loss=0.009337, over 15936.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08924, pruned_loss=0.01195, audio_tagging_loss=0.008659, over 3036496.70 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:58:23,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3534013.3333333335, ans=0.125 2023-11-28 13:58:58,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3534213.3333333335, ans=0.125 2023-11-28 13:59:08,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3534280.0, ans=0.125 2023-11-28 13:59:09,415 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-28 13:59:13,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-28 13:59:14,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.910e+01 9.787e+01 1.025e+02 1.500e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 13:59:14,322 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1100, loss[loss=0.06688, simple_loss=0.09548, pruned_loss=0.01263, audio_tagging_loss=0.006513, over 15154.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08888, pruned_loss=0.01202, audio_tagging_loss=0.008615, over 3029575.50 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:59:19,348 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:59:34,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3534413.3333333335, ans=0.125 2023-11-28 13:59:39,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2023-11-28 13:59:44,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3534480.0, ans=0.125 2023-11-28 13:59:46,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3534480.0, ans=0.125 2023-11-28 13:59:48,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-11-28 13:59:51,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3534546.6666666665, ans=0.0 2023-11-28 13:59:52,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-28 13:59:56,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3534546.6666666665, ans=0.2 2023-11-28 14:00:04,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3534613.3333333335, ans=0.0 2023-11-28 14:00:08,084 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-28 14:00:12,826 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1150, loss[loss=0.08517, simple_loss=0.1179, pruned_loss=0.01984, audio_tagging_loss=0.006398, over 16341.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08831, pruned_loss=0.01216, audio_tagging_loss=0.008629, over 3035176.30 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:00:25,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-28 14:00:45,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3534813.3333333335, ans=0.125 2023-11-28 14:00:54,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-28 14:00:54,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-28 14:01:06,344 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-28 14:01:10,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 9.000e+01 9.554e+01 1.019e+02 1.286e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 14:01:10,714 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1200, loss[loss=0.06331, simple_loss=0.09319, pruned_loss=0.01137, audio_tagging_loss=0.005338, over 14286.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08722, pruned_loss=0.01204, audio_tagging_loss=0.008577, over 3035236.80 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:01:10,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:13,104 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:01:15,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:17,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3535013.3333333335, ans=0.1 2023-11-28 14:01:19,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-28 14:01:27,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3535080.0, ans=0.125 2023-11-28 14:01:30,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=8.0 2023-11-28 14:01:36,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3535146.6666666665, ans=0.125 2023-11-28 14:01:46,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-28 14:02:04,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-28 14:02:09,388 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1250, loss[loss=0.07416, simple_loss=0.1045, pruned_loss=0.01405, audio_tagging_loss=0.007887, over 15377.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08827, pruned_loss=0.01202, audio_tagging_loss=0.008637, over 3047052.97 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:02:14,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2023-11-28 14:02:18,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.51 vs. limit=10.0 2023-11-28 14:02:26,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2023-11-28 14:02:27,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3535413.3333333335, ans=0.0 2023-11-28 14:02:45,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3535546.6666666665, ans=0.125 2023-11-28 14:02:49,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3535546.6666666665, ans=0.0 2023-11-28 14:03:02,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-28 14:03:07,353 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1300, loss[loss=0.0631, simple_loss=0.08286, pruned_loss=0.00995, audio_tagging_loss=0.01173, over 14096.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08821, pruned_loss=0.01182, audio_tagging_loss=0.00857, over 3046935.28 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:03:08,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.419e+01 9.205e+01 1.020e+02 1.250e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 14:03:19,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-28 14:03:26,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3535746.6666666665, ans=0.0 2023-11-28 14:03:32,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3535813.3333333335, ans=0.07 2023-11-28 14:03:34,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-28 14:03:48,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3535880.0, ans=0.0 2023-11-28 14:04:01,220 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-28 14:04:05,969 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1350, loss[loss=0.06653, simple_loss=0.09343, pruned_loss=0.01167, audio_tagging_loss=0.008151, over 14961.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08849, pruned_loss=0.01179, audio_tagging_loss=0.008515, over 3039669.14 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:04:23,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3536080.0, ans=0.125 2023-11-28 14:04:32,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3536146.6666666665, ans=0.2 2023-11-28 14:04:50,243 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:04:59,494 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-28 14:05:01,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3536280.0, ans=0.1 2023-11-28 14:05:03,905 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1400, loss[loss=0.05644, simple_loss=0.07276, pruned_loss=0.01031, audio_tagging_loss=0.009749, over 14208.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.0885, pruned_loss=0.0119, audio_tagging_loss=0.008568, over 3038924.83 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:05:06,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.943e+01 9.366e+01 9.966e+01 1.345e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 14:05:32,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-28 14:05:44,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3536546.6666666665, ans=0.0 2023-11-28 14:05:53,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3536613.3333333335, ans=0.0 2023-11-28 14:05:54,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-28 14:05:57,454 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-28 14:05:57,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-28 14:06:01,796 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1450, loss[loss=0.07304, simple_loss=0.1067, pruned_loss=0.01206, audio_tagging_loss=0.007641, over 15450.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08942, pruned_loss=0.01208, audio_tagging_loss=0.008607, over 3045415.18 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:06:08,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3536680.0, ans=0.1 2023-11-28 14:06:09,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-28 14:06:20,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3536746.6666666665, ans=0.125 2023-11-28 14:06:25,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-28 14:06:55,242 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-28 14:07:00,235 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1500, loss[loss=0.05781, simple_loss=0.08632, pruned_loss=0.005558, audio_tagging_loss=0.00909, over 15139.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09035, pruned_loss=0.01222, audio_tagging_loss=0.008724, over 3045053.83 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:07:02,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.037e+01 9.664e+01 1.030e+02 1.385e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 14:07:18,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3537080.0, ans=0.2 2023-11-28 14:07:23,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3537146.6666666665, ans=0.125 2023-11-28 14:07:39,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3537213.3333333335, ans=0.2 2023-11-28 14:07:53,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-28 14:07:56,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3537280.0, ans=0.125 2023-11-28 14:07:58,708 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1550, loss[loss=0.07494, simple_loss=0.08905, pruned_loss=0.01804, audio_tagging_loss=0.01238, over 14763.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08986, pruned_loss=0.01222, audio_tagging_loss=0.008768, over 3044946.12 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:08:08,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3537346.6666666665, ans=0.0 2023-11-28 14:08:23,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=15.0 2023-11-28 14:08:39,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2023-11-28 14:08:40,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3537546.6666666665, ans=0.035 2023-11-28 14:08:42,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3537546.6666666665, ans=0.04949747468305833 2023-11-28 14:08:51,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-28 14:08:55,986 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1600, loss[loss=0.0634, simple_loss=0.0796, pruned_loss=0.01233, audio_tagging_loss=0.01127, over 14696.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08938, pruned_loss=0.01219, audio_tagging_loss=0.008886, over 3040707.00 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:08:58,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 9.104e+01 9.583e+01 1.052e+02 1.503e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 14:08:58,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3537680.0, ans=0.2 2023-11-28 14:09:03,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3537680.0, ans=0.125 2023-11-28 14:09:04,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-28 14:09:25,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3537813.3333333335, ans=0.0 2023-11-28 14:09:48,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-28 14:09:50,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3537946.6666666665, ans=0.07 2023-11-28 14:09:53,828 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1650, loss[loss=0.05508, simple_loss=0.07576, pruned_loss=0.007821, audio_tagging_loss=0.009375, over 14286.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08874, pruned_loss=0.01205, audio_tagging_loss=0.008907, over 3033052.34 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:15,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3538146.6666666665, ans=0.125 2023-11-28 14:10:16,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-28 14:10:33,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3538213.3333333335, ans=0.125 2023-11-28 14:10:40,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3538280.0, ans=0.125 2023-11-28 14:10:44,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2023-11-28 14:10:47,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-28 14:10:51,977 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1700, loss[loss=0.07228, simple_loss=0.1114, pruned_loss=0.008293, audio_tagging_loss=0.00827, over 14524.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08929, pruned_loss=0.01205, audio_tagging_loss=0.008846, over 3039445.85 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:54,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.956e+01 9.565e+01 1.008e+02 1.733e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 14:11:14,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-28 14:11:18,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538480.0, ans=0.1 2023-11-28 14:11:22,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3538480.0, ans=0.2 2023-11-28 14:11:29,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-28 14:11:45,469 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-28 14:11:50,089 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1750, loss[loss=0.07155, simple_loss=0.1032, pruned_loss=0.01239, audio_tagging_loss=0.007577, over 15332.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08922, pruned_loss=0.01204, audio_tagging_loss=0.008786, over 3044597.76 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:11:58,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2023-11-28 14:12:02,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3538746.6666666665, ans=0.0 2023-11-28 14:12:08,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3538746.6666666665, ans=0.04949747468305833 2023-11-28 14:12:11,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3538746.6666666665, ans=0.125 2023-11-28 14:12:14,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3538813.3333333335, ans=0.0 2023-11-28 14:12:15,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538813.3333333335, ans=0.1 2023-11-28 14:12:20,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3538813.3333333335, ans=0.07 2023-11-28 14:12:43,149 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-28 14:12:47,429 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1800, loss[loss=0.06755, simple_loss=0.0913, pruned_loss=0.01076, audio_tagging_loss=0.01114, over 16756.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08918, pruned_loss=0.01201, audio_tagging_loss=0.008706, over 3053662.42 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:12:50,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.905e+01 9.378e+01 9.880e+01 1.265e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 14:13:17,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-28 14:13:17,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-28 14:13:35,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2023-11-28 14:13:41,508 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-28 14:13:46,507 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1850, loss[loss=0.06877, simple_loss=0.08786, pruned_loss=0.01468, audio_tagging_loss=0.01016, over 14040.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09021, pruned_loss=0.01224, audio_tagging_loss=0.008633, over 3056812.10 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:13:58,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3539413.3333333335, ans=0.0 2023-11-28 14:14:18,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3539480.0, ans=0.125 2023-11-28 14:14:26,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3539546.6666666665, ans=0.0 2023-11-28 14:14:28,166 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:14:37,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3539613.3333333335, ans=0.125 2023-11-28 14:14:37,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3539613.3333333335, ans=0.1 2023-11-28 14:14:40,737 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-28 14:14:45,117 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1900, loss[loss=0.07214, simple_loss=0.0934, pruned_loss=0.01665, audio_tagging_loss=0.008794, over 14671.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08962, pruned_loss=0.0122, audio_tagging_loss=0.008602, over 3064204.16 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:14:45,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3539680.0, ans=0.125 2023-11-28 14:14:47,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.623e+01 9.343e+01 1.003e+02 1.342e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 14:15:10,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3539813.3333333335, ans=0.0 2023-11-28 14:15:11,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3539813.3333333335, ans=0.0 2023-11-28 14:15:38,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-28 14:15:43,046 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1950, loss[loss=0.05303, simple_loss=0.07084, pruned_loss=0.009819, audio_tagging_loss=0.007793, over 14432.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08949, pruned_loss=0.01208, audio_tagging_loss=0.008681, over 3055395.90 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:15:44,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3540013.3333333335, ans=0.125 2023-11-28 14:15:54,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3540080.0, ans=0.125 2023-11-28 14:16:05,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540146.6666666665, ans=0.1 2023-11-28 14:16:35,974 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-28 14:16:40,329 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2000, loss[loss=0.07808, simple_loss=0.1116, pruned_loss=0.0153, audio_tagging_loss=0.006991, over 15621.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08983, pruned_loss=0.01221, audio_tagging_loss=0.00863, over 3055994.31 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:16:42,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.874e+01 9.480e+01 1.027e+02 1.449e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:17:07,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3540480.0, ans=0.0 2023-11-28 14:17:14,523 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:17:14,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-28 14:17:18,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3540546.6666666665, ans=0.125 2023-11-28 14:17:21,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-28 14:17:24,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3540546.6666666665, ans=0.125 2023-11-28 14:17:34,597 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-28 14:17:39,032 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2050, loss[loss=0.05259, simple_loss=0.07032, pruned_loss=0.005524, audio_tagging_loss=0.01191, over 14865.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08957, pruned_loss=0.01217, audio_tagging_loss=0.008609, over 3049859.73 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:17:56,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540746.6666666665, ans=0.125 2023-11-28 14:18:13,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3540880.0, ans=0.125 2023-11-28 14:18:19,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3540880.0, ans=0.125 2023-11-28 14:18:31,775 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-28 14:18:34,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-28 14:18:36,114 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2100, loss[loss=0.06383, simple_loss=0.08329, pruned_loss=0.01092, audio_tagging_loss=0.01126, over 14878.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08913, pruned_loss=0.01198, audio_tagging_loss=0.008661, over 3047610.04 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:18:39,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.857e+01 9.324e+01 1.026e+02 1.303e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 14:18:44,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3541013.3333333335, ans=0.0 2023-11-28 14:18:58,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-11-28 14:18:59,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3541146.6666666665, ans=0.0 2023-11-28 14:18:59,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541146.6666666665, ans=0.1 2023-11-28 14:19:05,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.89 vs. limit=10.0 2023-11-28 14:19:12,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3541213.3333333335, ans=0.125 2023-11-28 14:19:22,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3541280.0, ans=0.2 2023-11-28 14:19:29,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-28 14:19:34,415 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2150, loss[loss=0.05028, simple_loss=0.0646, pruned_loss=0.00582, audio_tagging_loss=0.01216, over 14899.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08988, pruned_loss=0.012, audio_tagging_loss=0.008621, over 3045147.81 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:19:49,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-28 14:19:52,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-28 14:19:53,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-28 14:19:59,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3541480.0, ans=0.0 2023-11-28 14:20:00,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3541480.0, ans=0.1 2023-11-28 14:20:03,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3541480.0, ans=0.0 2023-11-28 14:20:08,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-28 14:20:09,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-28 14:20:09,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3541546.6666666665, ans=0.0 2023-11-28 14:20:12,143 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:20:14,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3541546.6666666665, ans=0.5 2023-11-28 14:20:14,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3541546.6666666665, ans=15.0 2023-11-28 14:20:28,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-28 14:20:30,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541613.3333333335, ans=0.1 2023-11-28 14:20:32,792 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2200, loss[loss=0.07207, simple_loss=0.09993, pruned_loss=0.01339, audio_tagging_loss=0.008714, over 14447.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09041, pruned_loss=0.01199, audio_tagging_loss=0.008594, over 3049483.97 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:20:36,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.900e+01 9.511e+01 1.009e+02 1.221e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 14:21:01,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3541813.3333333335, ans=0.1 2023-11-28 14:21:07,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3541880.0, ans=0.125 2023-11-28 14:21:16,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541880.0, ans=0.1 2023-11-28 14:21:18,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3541946.6666666665, ans=0.0 2023-11-28 14:21:22,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541946.6666666665, ans=0.1 2023-11-28 14:21:25,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-28 14:21:26,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-28 14:21:31,078 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2250, loss[loss=0.06448, simple_loss=0.08253, pruned_loss=0.01386, audio_tagging_loss=0.009345, over 14191.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08996, pruned_loss=0.01203, audio_tagging_loss=0.008656, over 3042573.60 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:22:06,528 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:22:16,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3542280.0, ans=0.125 2023-11-28 14:22:24,233 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-28 14:22:28,599 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2300, loss[loss=0.05774, simple_loss=0.07826, pruned_loss=0.01062, audio_tagging_loss=0.00799, over 14558.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09012, pruned_loss=0.01212, audio_tagging_loss=0.008694, over 3045703.10 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:22:31,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.166e+01 9.728e+01 1.033e+02 1.405e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 14:22:53,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3542480.0, ans=0.0 2023-11-28 14:22:55,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3542480.0, ans=0.125 2023-11-28 14:23:02,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542546.6666666665, ans=0.1 2023-11-28 14:23:10,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=22.5 2023-11-28 14:23:21,464 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:23:21,512 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-28 14:23:26,664 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2350, loss[loss=0.06652, simple_loss=0.0779, pruned_loss=0.01657, audio_tagging_loss=0.011, over 16276.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08992, pruned_loss=0.01231, audio_tagging_loss=0.008786, over 3047215.67 frames. ], batch size: 63, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:23:29,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3542680.0, ans=0.0 2023-11-28 14:23:38,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-28 14:23:43,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3542746.6666666665, ans=0.125 2023-11-28 14:23:50,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3542813.3333333335, ans=0.0 2023-11-28 14:24:17,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-28 14:24:20,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-28 14:24:25,472 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2400, loss[loss=0.06847, simple_loss=0.1009, pruned_loss=0.009775, audio_tagging_loss=0.008227, over 15229.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09028, pruned_loss=0.01218, audio_tagging_loss=0.0089, over 3051708.01 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:24:27,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3543013.3333333335, ans=0.0 2023-11-28 14:24:28,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.802e+01 9.417e+01 9.979e+01 1.299e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 14:24:30,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3543013.3333333335, ans=0.0 2023-11-28 14:24:57,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3543146.6666666665, ans=0.125 2023-11-28 14:25:05,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.61 vs. limit=10.0 2023-11-28 14:25:18,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-28 14:25:23,331 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2450, loss[loss=0.04734, simple_loss=0.06008, pruned_loss=0.005233, audio_tagging_loss=0.01207, over 15610.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08962, pruned_loss=0.01204, audio_tagging_loss=0.009041, over 3050795.47 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:25:33,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3543413.3333333335, ans=0.2 2023-11-28 14:25:33,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-28 14:25:46,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-28 14:25:52,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3543480.0, ans=0.125 2023-11-28 14:25:56,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3543480.0, ans=0.125 2023-11-28 14:26:16,406 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-28 14:26:21,222 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2500, loss[loss=0.06574, simple_loss=0.08366, pruned_loss=0.01432, audio_tagging_loss=0.009593, over 16116.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08943, pruned_loss=0.01191, audio_tagging_loss=0.009033, over 3044951.18 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:26:21,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3543680.0, ans=0.0 2023-11-28 14:26:25,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.030e+01 9.693e+01 1.035e+02 1.388e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 14:26:31,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3543746.6666666665, ans=0.0 2023-11-28 14:26:34,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3543746.6666666665, ans=0.0 2023-11-28 14:26:51,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3543813.3333333335, ans=0.125 2023-11-28 14:26:55,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3543880.0, ans=0.125 2023-11-28 14:26:56,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3543880.0, ans=0.2 2023-11-28 14:27:01,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3543880.0, ans=0.125 2023-11-28 14:27:14,742 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-28 14:27:16,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3543946.6666666665, ans=10.0 2023-11-28 14:27:19,440 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2550, loss[loss=0.05905, simple_loss=0.08354, pruned_loss=0.009556, audio_tagging_loss=0.007726, over 14625.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08895, pruned_loss=0.01183, audio_tagging_loss=0.008974, over 3048837.73 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:27:58,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3544213.3333333335, ans=0.04949747468305833 2023-11-28 14:28:00,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2023-11-28 14:28:06,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3544280.0, ans=0.0 2023-11-28 14:28:13,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-28 14:28:18,555 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2600, loss[loss=0.07793, simple_loss=0.109, pruned_loss=0.01404, audio_tagging_loss=0.009376, over 14708.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08884, pruned_loss=0.01187, audio_tagging_loss=0.008861, over 3046016.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:28:24,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.904e+01 9.542e+01 1.021e+02 1.415e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 14:28:34,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3544413.3333333335, ans=0.07 2023-11-28 14:28:39,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3544413.3333333335, ans=0.2 2023-11-28 14:28:41,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3544480.0, ans=0.125 2023-11-28 14:28:57,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3544546.6666666665, ans=0.125 2023-11-28 14:29:06,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3544613.3333333335, ans=0.95 2023-11-28 14:29:10,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3544613.3333333335, ans=0.0 2023-11-28 14:29:11,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=22.5 2023-11-28 14:29:11,496 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-28 14:29:11,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2023-11-28 14:29:12,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3544613.3333333335, ans=0.125 2023-11-28 14:29:16,002 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2650, loss[loss=0.07467, simple_loss=0.1023, pruned_loss=0.0157, audio_tagging_loss=0.007827, over 15237.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.089, pruned_loss=0.01198, audio_tagging_loss=0.008749, over 3043154.60 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:29:24,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-28 14:29:42,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3544813.3333333335, ans=0.125 2023-11-28 14:29:51,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3544880.0, ans=0.0 2023-11-28 14:29:51,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=10.0 2023-11-28 14:30:09,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3544946.6666666665, ans=0.0 2023-11-28 14:30:10,244 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-28 14:30:14,512 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2700, loss[loss=0.06613, simple_loss=0.08528, pruned_loss=0.0133, audio_tagging_loss=0.01019, over 15875.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.0891, pruned_loss=0.01203, audio_tagging_loss=0.00869, over 3044124.93 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:30:19,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.994e+01 9.441e+01 1.024e+02 1.210e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 14:30:21,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-28 14:30:43,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3545146.6666666665, ans=0.2 2023-11-28 14:30:48,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3545213.3333333335, ans=0.1 2023-11-28 14:30:51,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-28 14:30:59,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3545280.0, ans=0.0 2023-11-28 14:31:04,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-11-28 14:31:07,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-28 14:31:12,760 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2750, loss[loss=0.05002, simple_loss=0.07107, pruned_loss=0.006478, audio_tagging_loss=0.008011, over 15380.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08859, pruned_loss=0.012, audio_tagging_loss=0.008704, over 3039110.08 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:31:30,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3545413.3333333335, ans=0.0 2023-11-28 14:31:33,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3545413.3333333335, ans=0.0 2023-11-28 14:31:45,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3545480.0, ans=0.0 2023-11-28 14:31:54,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-28 14:32:05,261 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:32:06,435 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-28 14:32:06,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3545613.3333333335, ans=0.0 2023-11-28 14:32:10,835 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2800, loss[loss=0.07639, simple_loss=0.1038, pruned_loss=0.01817, audio_tagging_loss=0.006335, over 14832.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08949, pruned_loss=0.01213, audio_tagging_loss=0.008613, over 3040593.52 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:32:16,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.903e+01 9.379e+01 1.017e+02 3.083e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-28 14:32:16,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3545680.0, ans=0.02 2023-11-28 14:32:42,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3545813.3333333335, ans=0.07 2023-11-28 14:32:48,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3545880.0, ans=0.1 2023-11-28 14:32:56,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3545946.6666666665, ans=0.2 2023-11-28 14:33:04,488 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-28 14:33:05,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3545946.6666666665, ans=0.125 2023-11-28 14:33:07,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3546013.3333333335, ans=0.1 2023-11-28 14:33:08,784 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2850, loss[loss=0.07639, simple_loss=0.1105, pruned_loss=0.01261, audio_tagging_loss=0.008531, over 16353.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08893, pruned_loss=0.01216, audio_tagging_loss=0.00858, over 3046560.69 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:33:35,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3546146.6666666665, ans=0.125 2023-11-28 14:33:44,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3546213.3333333335, ans=0.0 2023-11-28 14:33:51,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3546213.3333333335, ans=10.0 2023-11-28 14:33:57,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-28 14:34:01,806 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-28 14:34:01,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3546280.0, ans=0.125 2023-11-28 14:34:06,184 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2900, loss[loss=0.04998, simple_loss=0.06832, pruned_loss=0.00502, audio_tagging_loss=0.0108, over 14893.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08992, pruned_loss=0.01233, audio_tagging_loss=0.008561, over 3051509.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:34:11,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3546346.6666666665, ans=0.0 2023-11-28 14:34:12,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.752e+01 9.369e+01 1.016e+02 1.365e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:34:32,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3546480.0, ans=0.2 2023-11-28 14:35:00,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-28 14:35:07,385 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2950, loss[loss=0.05271, simple_loss=0.06066, pruned_loss=0.01149, audio_tagging_loss=0.01089, over 14604.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08973, pruned_loss=0.01228, audio_tagging_loss=0.00855, over 3056795.64 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:35:27,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3546746.6666666665, ans=0.125 2023-11-28 14:35:35,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3546813.3333333335, ans=0.125 2023-11-28 14:35:39,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3546813.3333333335, ans=0.125 2023-11-28 14:35:58,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-28 14:36:01,128 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-28 14:36:03,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3546946.6666666665, ans=0.125 2023-11-28 14:36:05,951 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3000, loss[loss=0.05457, simple_loss=0.07595, pruned_loss=0.007901, audio_tagging_loss=0.008694, over 14923.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08973, pruned_loss=0.01236, audio_tagging_loss=0.008602, over 3051070.02 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:36:05,951 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 14:36:23,729 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.9177, 2.5700, 2.2704, 2.6509, 2.4099, 2.4593, 2.4290, 2.5017], device='cuda:3') 2023-11-28 14:36:32,599 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9949, 4.0523, 4.8590, 4.4721], device='cuda:3') 2023-11-28 14:36:34,338 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7959, 5.8518, 5.8993, 5.8907], device='cuda:3') 2023-11-28 14:36:41,298 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05774, simple_loss=0.05054, pruned_loss=0.005299, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 14:36:41,298 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 14:36:46,868 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.901e+01 9.475e+01 1.021e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 14:36:49,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3547013.3333333335, ans=0.1 2023-11-28 14:37:07,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3547146.6666666665, ans=0.125 2023-11-28 14:37:26,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-28 14:37:34,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-28 14:37:39,439 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3050, loss[loss=0.07978, simple_loss=0.1109, pruned_loss=0.01426, audio_tagging_loss=0.01008, over 14955.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08936, pruned_loss=0.01223, audio_tagging_loss=0.008744, over 3049089.95 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:37:42,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3547346.6666666665, ans=0.0 2023-11-28 14:38:15,444 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:38:33,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-28 14:38:37,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3547680.0, ans=0.125 2023-11-28 14:38:38,085 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3100, loss[loss=0.07188, simple_loss=0.09233, pruned_loss=0.01507, audio_tagging_loss=0.01063, over 14198.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08972, pruned_loss=0.01231, audio_tagging_loss=0.008859, over 3047963.73 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:38:43,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.815e+01 9.478e+01 1.004e+02 1.302e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:38:45,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2023-11-28 14:38:48,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3547746.6666666665, ans=0.0 2023-11-28 14:39:13,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3547880.0, ans=0.0 2023-11-28 14:39:31,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-28 14:39:35,923 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3150, loss[loss=0.06469, simple_loss=0.0868, pruned_loss=0.009674, audio_tagging_loss=0.01161, over 14632.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09046, pruned_loss=0.01241, audio_tagging_loss=0.008939, over 3044357.62 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:39:39,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3548013.3333333335, ans=0.125 2023-11-28 14:40:12,935 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:40:15,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3548146.6666666665, ans=0.125 2023-11-28 14:40:40,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-28 14:40:47,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3548213.3333333335, ans=0.0 2023-11-28 14:41:13,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3548280.0, ans=0.0 2023-11-28 14:41:16,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-28 14:41:21,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3548346.6666666665, ans=0.1 2023-11-28 14:41:23,672 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3200, loss[loss=0.04975, simple_loss=0.06184, pruned_loss=0.009355, audio_tagging_loss=0.009478, over 14723.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08986, pruned_loss=0.01235, audio_tagging_loss=0.009078, over 3047286.82 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:41:33,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.062e+01 8.970e+01 9.466e+01 1.009e+02 1.247e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 14:41:33,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3548346.6666666665, ans=0.125 2023-11-28 14:42:19,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-28 14:42:25,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-28 14:42:35,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3548546.6666666665, ans=0.125 2023-11-28 14:42:41,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-28 14:42:51,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3548613.3333333335, ans=0.0 2023-11-28 14:42:53,492 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-28 14:42:56,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3548613.3333333335, ans=0.0 2023-11-28 14:42:59,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3548680.0, ans=0.0 2023-11-28 14:43:00,647 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3250, loss[loss=0.06346, simple_loss=0.08413, pruned_loss=0.01386, audio_tagging_loss=0.007532, over 15147.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08981, pruned_loss=0.01224, audio_tagging_loss=0.009028, over 3045982.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:43:10,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3548680.0, ans=0.0 2023-11-28 14:43:40,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3548813.3333333335, ans=0.125 2023-11-28 14:44:01,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3548880.0, ans=0.0 2023-11-28 14:44:27,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-28 14:44:30,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3548946.6666666665, ans=0.07 2023-11-28 14:44:35,248 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3300, loss[loss=0.07192, simple_loss=0.105, pruned_loss=0.01162, audio_tagging_loss=0.007798, over 14847.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08955, pruned_loss=0.01213, audio_tagging_loss=0.009102, over 3048021.59 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:44:49,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.914e+01 9.371e+01 1.015e+02 1.378e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:44:56,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3549080.0, ans=0.2 2023-11-28 14:45:02,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3549080.0, ans=0.0 2023-11-28 14:45:39,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549213.3333333335, ans=0.1 2023-11-28 14:45:55,608 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-28 14:46:02,246 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3350, loss[loss=0.08004, simple_loss=0.1033, pruned_loss=0.01876, audio_tagging_loss=0.009654, over 14284.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09051, pruned_loss=0.01228, audio_tagging_loss=0.008981, over 3050448.40 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:46:44,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3549480.0, ans=0.125 2023-11-28 14:46:44,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3549480.0, ans=0.125 2023-11-28 14:47:18,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-28 14:47:25,132 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3400, loss[loss=0.07323, simple_loss=0.1051, pruned_loss=0.01273, audio_tagging_loss=0.007953, over 16067.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0899, pruned_loss=0.01225, audio_tagging_loss=0.008824, over 3058557.47 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:47:35,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.841e+01 9.503e+01 1.045e+02 1.895e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 14:47:47,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3549746.6666666665, ans=0.0 2023-11-28 14:48:10,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3549880.0, ans=0.125 2023-11-28 14:48:23,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3549880.0, ans=0.0 2023-11-28 14:48:33,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549946.6666666665, ans=0.1 2023-11-28 14:48:37,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-28 14:48:42,901 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3450, loss[loss=0.05931, simple_loss=0.08057, pruned_loss=0.01063, audio_tagging_loss=0.008385, over 15392.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09021, pruned_loss=0.01208, audio_tagging_loss=0.008751, over 3060487.12 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:48:54,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3550013.3333333335, ans=0.2 2023-11-28 14:49:27,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3550146.6666666665, ans=0.125 2023-11-28 14:49:41,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3550213.3333333335, ans=0.2 2023-11-28 14:49:53,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-28 14:49:59,759 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3500, loss[loss=0.05539, simple_loss=0.07912, pruned_loss=0.00913, audio_tagging_loss=0.006701, over 14996.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08903, pruned_loss=0.01193, audio_tagging_loss=0.008653, over 3060157.64 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:50:08,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.938e+01 9.500e+01 1.028e+02 1.250e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 14:50:20,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-11-28 14:50:39,244 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:50:50,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-11-28 14:51:07,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-28 14:51:08,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550613.3333333335, ans=0.1 2023-11-28 14:51:13,083 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3550, loss[loss=0.0723, simple_loss=0.1015, pruned_loss=0.01282, audio_tagging_loss=0.008755, over 15453.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08868, pruned_loss=0.01191, audio_tagging_loss=0.008653, over 3059534.92 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:51:15,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550680.0, ans=0.1 2023-11-28 14:51:22,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3550680.0, ans=0.0 2023-11-28 14:51:43,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3550813.3333333335, ans=0.1 2023-11-28 14:51:50,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3550813.3333333335, ans=0.2 2023-11-28 14:52:18,619 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-28 14:52:24,464 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3600, loss[loss=0.05565, simple_loss=0.07734, pruned_loss=0.009283, audio_tagging_loss=0.007695, over 15138.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08785, pruned_loss=0.01163, audio_tagging_loss=0.008598, over 3057165.79 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:52:32,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.883e+01 9.624e+01 1.026e+02 1.265e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 14:52:32,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3551013.3333333335, ans=0.0 2023-11-28 14:52:35,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3551013.3333333335, ans=0.1 2023-11-28 14:52:39,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.14 vs. limit=10.0 2023-11-28 14:52:46,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3551080.0, ans=0.1 2023-11-28 14:52:59,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3551146.6666666665, ans=0.125 2023-11-28 14:53:03,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3551146.6666666665, ans=0.1 2023-11-28 14:53:07,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3551213.3333333335, ans=0.125 2023-11-28 14:53:13,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551213.3333333335, ans=0.1 2023-11-28 14:53:24,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3551280.0, ans=0.0 2023-11-28 14:53:28,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3551280.0, ans=0.125 2023-11-28 14:53:29,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-28 14:53:34,848 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3650, loss[loss=0.07663, simple_loss=0.1036, pruned_loss=0.01588, audio_tagging_loss=0.008928, over 15678.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08859, pruned_loss=0.01175, audio_tagging_loss=0.008646, over 3063487.85 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:53:47,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3551413.3333333335, ans=0.125 2023-11-28 14:53:58,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3551413.3333333335, ans=0.07 2023-11-28 14:54:27,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3551546.6666666665, ans=0.0 2023-11-28 14:54:38,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3551613.3333333335, ans=0.125 2023-11-28 14:54:39,913 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-28 14:54:45,609 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3700, loss[loss=0.0716, simple_loss=0.1041, pruned_loss=0.01339, audio_tagging_loss=0.006172, over 14639.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08831, pruned_loss=0.01172, audio_tagging_loss=0.00864, over 3056435.05 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:54:55,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.996e+01 9.675e+01 1.033e+02 1.277e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 14:55:15,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3551813.3333333335, ans=0.125 2023-11-28 14:55:33,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3551880.0, ans=0.07 2023-11-28 14:55:36,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551880.0, ans=0.1 2023-11-28 14:55:37,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3551880.0, ans=0.125 2023-11-28 14:55:42,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3551946.6666666665, ans=0.0 2023-11-28 14:55:47,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-28 14:55:53,200 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3750, loss[loss=0.05768, simple_loss=0.08093, pruned_loss=0.01067, audio_tagging_loss=0.006543, over 15436.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08797, pruned_loss=0.01178, audio_tagging_loss=0.008699, over 3060471.48 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:55:59,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3552013.3333333335, ans=0.2 2023-11-28 14:56:19,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=12.0 2023-11-28 14:56:22,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3552146.6666666665, ans=0.125 2023-11-28 14:56:29,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2023-11-28 14:56:30,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3552146.6666666665, ans=0.2 2023-11-28 14:56:40,565 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:56:45,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-28 14:56:54,214 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-28 14:56:54,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3552280.0, ans=0.125 2023-11-28 14:56:58,838 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3800, loss[loss=0.06937, simple_loss=0.09573, pruned_loss=0.01452, audio_tagging_loss=0.006988, over 14298.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08891, pruned_loss=0.01193, audio_tagging_loss=0.008681, over 3058473.27 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:57:04,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-28 14:57:07,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 9.078e+01 9.747e+01 1.050e+02 1.632e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-28 14:57:38,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3552546.6666666665, ans=0.125 2023-11-28 14:57:39,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3552546.6666666665, ans=0.0 2023-11-28 14:57:50,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-11-28 14:57:56,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-28 14:58:01,923 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3850, loss[loss=0.08228, simple_loss=0.1185, pruned_loss=0.01682, audio_tagging_loss=0.0062, over 15674.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08903, pruned_loss=0.01205, audio_tagging_loss=0.008684, over 3058556.80 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:58:19,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3552746.6666666665, ans=10.0 2023-11-28 14:58:59,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-28 14:59:01,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-28 14:59:04,012 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3900, loss[loss=0.05833, simple_loss=0.06989, pruned_loss=0.01115, audio_tagging_loss=0.01224, over 14402.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08857, pruned_loss=0.01194, audio_tagging_loss=0.0088, over 3054266.62 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:59:04,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3553013.3333333335, ans=10.0 2023-11-28 14:59:12,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.774e+01 9.422e+01 1.004e+02 1.392e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 14:59:14,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3553080.0, ans=0.0 2023-11-28 14:59:31,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-28 14:59:37,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3553146.6666666665, ans=0.125 2023-11-28 14:59:39,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3553213.3333333335, ans=0.05 2023-11-28 14:59:40,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3553213.3333333335, ans=10.0 2023-11-28 14:59:52,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-28 14:59:58,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3553280.0, ans=0.0 2023-11-28 14:59:59,736 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-28 15:00:05,522 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3950, loss[loss=0.08835, simple_loss=0.1236, pruned_loss=0.01829, audio_tagging_loss=0.008252, over 15272.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08921, pruned_loss=0.01204, audio_tagging_loss=0.008966, over 3048037.09 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:00:19,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3553413.3333333335, ans=0.125 2023-11-28 15:00:22,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-28 15:00:23,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3553413.3333333335, ans=0.125 2023-11-28 15:00:24,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3553413.3333333335, ans=0.0 2023-11-28 15:00:24,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3553413.3333333335, ans=0.2 2023-11-28 15:00:41,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3553546.6666666665, ans=0.0 2023-11-28 15:00:46,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3553546.6666666665, ans=0.0 2023-11-28 15:01:00,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-28 15:01:04,948 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4000, loss[loss=0.08496, simple_loss=0.11, pruned_loss=0.02007, audio_tagging_loss=0.009898, over 14765.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.0908, pruned_loss=0.01226, audio_tagging_loss=0.008955, over 3043787.26 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:01:05,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3553680.0, ans=0.0 2023-11-28 15:01:13,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 9.021e+01 9.610e+01 1.042e+02 1.658e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 15:01:18,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3553746.6666666665, ans=0.125 2023-11-28 15:02:00,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-28 15:02:04,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3554013.3333333335, ans=0.07 2023-11-28 15:02:05,802 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4050, loss[loss=0.07703, simple_loss=0.118, pruned_loss=0.01234, audio_tagging_loss=0.005688, over 15170.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09159, pruned_loss=0.01258, audio_tagging_loss=0.008917, over 3037155.72 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:02:10,503 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:02:28,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3554080.0, ans=10.0 2023-11-28 15:03:01,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-28 15:03:02,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3554280.0, ans=0.125 2023-11-28 15:03:06,487 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4100, loss[loss=0.06698, simple_loss=0.09037, pruned_loss=0.01456, audio_tagging_loss=0.007237, over 15227.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09258, pruned_loss=0.01266, audio_tagging_loss=0.008855, over 3045784.39 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:03:09,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3554346.6666666665, ans=0.125 2023-11-28 15:03:16,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.818e+01 9.455e+01 1.012e+02 1.204e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 15:03:22,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3554413.3333333335, ans=0.125 2023-11-28 15:03:29,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 2023-11-28 15:03:29,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3554480.0, ans=0.125 2023-11-28 15:04:01,653 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-28 15:04:07,016 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4150, loss[loss=0.05783, simple_loss=0.06574, pruned_loss=0.01304, audio_tagging_loss=0.01192, over 14216.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09211, pruned_loss=0.01263, audio_tagging_loss=0.008776, over 3043110.08 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:04:13,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-28 15:04:24,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3554746.6666666665, ans=0.2 2023-11-28 15:04:26,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-28 15:04:30,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-28 15:04:36,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3554813.3333333335, ans=0.2 2023-11-28 15:04:38,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3554813.3333333335, ans=0.0 2023-11-28 15:04:48,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3554880.0, ans=0.125 2023-11-28 15:04:52,385 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:04:54,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 15:04:57,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-28 15:05:00,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-28 15:05:01,834 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-28 15:05:04,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554946.6666666665, ans=0.1 2023-11-28 15:05:05,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3555013.3333333335, ans=0.2 2023-11-28 15:05:06,755 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4200, loss[loss=0.07355, simple_loss=0.1029, pruned_loss=0.01084, audio_tagging_loss=0.01125, over 16075.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09215, pruned_loss=0.01255, audio_tagging_loss=0.008655, over 3041273.95 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:05:07,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-11-28 15:05:09,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3555013.3333333335, ans=0.0 2023-11-28 15:05:15,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.572e+01 9.503e+01 1.029e+02 1.274e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 15:05:40,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2023-11-28 15:05:43,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3555213.3333333335, ans=0.125 2023-11-28 15:06:00,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-28 15:06:04,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3555346.6666666665, ans=0.0 2023-11-28 15:06:05,301 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4250, loss[loss=0.07115, simple_loss=0.1021, pruned_loss=0.01145, audio_tagging_loss=0.00864, over 15323.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09156, pruned_loss=0.01255, audio_tagging_loss=0.008616, over 3041246.29 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:06:07,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2023-11-28 15:06:08,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3555346.6666666665, ans=0.0 2023-11-28 15:06:08,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3555346.6666666665, ans=0.05 2023-11-28 15:06:10,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3555346.6666666665, ans=0.2 2023-11-28 15:06:28,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3555480.0, ans=0.2 2023-11-28 15:06:33,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3555480.0, ans=0.125 2023-11-28 15:06:45,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3555546.6666666665, ans=0.0 2023-11-28 15:06:55,780 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:07:00,039 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-28 15:07:04,438 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4300, loss[loss=0.06905, simple_loss=0.09238, pruned_loss=0.01506, audio_tagging_loss=0.007808, over 15251.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09141, pruned_loss=0.01255, audio_tagging_loss=0.008549, over 3040616.14 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:07:04,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2023-11-28 15:07:12,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3555680.0, ans=0.125 2023-11-28 15:07:13,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.132e+01 9.735e+01 1.057e+02 1.337e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 15:07:16,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3555746.6666666665, ans=0.1 2023-11-28 15:07:42,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-11-28 15:07:44,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3555880.0, ans=0.0 2023-11-28 15:07:48,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-28 15:07:58,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-28 15:08:03,642 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4350, loss[loss=0.04029, simple_loss=0.04983, pruned_loss=0.005484, audio_tagging_loss=0.00989, over 14587.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09117, pruned_loss=0.01259, audio_tagging_loss=0.008593, over 3042838.91 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:08:18,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3556080.0, ans=0.0 2023-11-28 15:08:20,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3556080.0, ans=0.2 2023-11-28 15:08:44,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556213.3333333335, ans=0.1 2023-11-28 15:08:54,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3556280.0, ans=0.125 2023-11-28 15:08:57,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-28 15:09:02,395 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4400, loss[loss=0.06993, simple_loss=0.1029, pruned_loss=0.01124, audio_tagging_loss=0.007249, over 15936.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09121, pruned_loss=0.01247, audio_tagging_loss=0.008548, over 3038214.49 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:09:04,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3556346.6666666665, ans=0.0 2023-11-28 15:09:09,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-28 15:09:12,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.943e+01 9.727e+01 1.045e+02 1.586e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 15:09:16,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-28 15:09:48,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3556546.6666666665, ans=6.0 2023-11-28 15:10:02,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-28 15:10:06,972 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4450, loss[loss=0.07138, simple_loss=0.09994, pruned_loss=0.01406, audio_tagging_loss=0.007348, over 15323.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09095, pruned_loss=0.01232, audio_tagging_loss=0.008486, over 3041391.92 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:10:13,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3556680.0, ans=0.125 2023-11-28 15:10:47,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.20 vs. limit=10.0 2023-11-28 15:11:01,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3556946.6666666665, ans=0.0 2023-11-28 15:11:04,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3556946.6666666665, ans=0.125 2023-11-28 15:11:06,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-28 15:11:06,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556946.6666666665, ans=0.1 2023-11-28 15:11:10,947 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4500, loss[loss=0.07641, simple_loss=0.1028, pruned_loss=0.01763, audio_tagging_loss=0.007364, over 15766.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09095, pruned_loss=0.01238, audio_tagging_loss=0.00837, over 3039976.30 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:11:22,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.792e+01 9.317e+01 1.000e+02 1.287e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 15:11:26,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-28 15:11:42,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3557146.6666666665, ans=0.125 2023-11-28 15:12:10,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-28 15:12:13,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3557280.0, ans=0.5 2023-11-28 15:12:15,964 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4550, loss[loss=0.0633, simple_loss=0.09396, pruned_loss=0.009345, audio_tagging_loss=0.006973, over 15681.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09109, pruned_loss=0.01249, audio_tagging_loss=0.008405, over 3035300.69 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:12:38,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3557413.3333333335, ans=0.0 2023-11-28 15:12:44,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3557480.0, ans=0.0 2023-11-28 15:13:06,945 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:13:14,904 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-28 15:13:19,714 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4600, loss[loss=0.06436, simple_loss=0.09341, pruned_loss=0.008392, audio_tagging_loss=0.009266, over 15012.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08993, pruned_loss=0.01228, audio_tagging_loss=0.008545, over 3040232.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:13:19,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3557680.0, ans=0.2 2023-11-28 15:13:22,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-28 15:13:29,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.928e+01 9.397e+01 1.022e+02 1.415e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 15:13:31,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3557746.6666666665, ans=0.125 2023-11-28 15:13:43,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-28 15:13:53,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3557813.3333333335, ans=0.2 2023-11-28 15:13:57,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3557880.0, ans=0.1 2023-11-28 15:13:59,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3557880.0, ans=0.0 2023-11-28 15:14:17,641 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-28 15:14:22,800 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4650, loss[loss=0.05837, simple_loss=0.08373, pruned_loss=0.008512, audio_tagging_loss=0.007992, over 14892.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08965, pruned_loss=0.01229, audio_tagging_loss=0.008683, over 3041792.71 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:14:26,604 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:14:55,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3558146.6666666665, ans=0.125 2023-11-28 15:15:21,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-28 15:15:21,841 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-28 15:15:27,136 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4700, loss[loss=0.07399, simple_loss=0.104, pruned_loss=0.01428, audio_tagging_loss=0.007687, over 15476.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08917, pruned_loss=0.01217, audio_tagging_loss=0.008742, over 3041119.34 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:15:28,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3558346.6666666665, ans=0.0 2023-11-28 15:15:38,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.852e+01 9.419e+01 1.023e+02 1.642e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 15:15:48,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3558413.3333333335, ans=0.125 2023-11-28 15:16:09,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3558546.6666666665, ans=0.2 2023-11-28 15:16:20,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3558613.3333333335, ans=0.2 2023-11-28 15:16:26,082 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-28 15:16:31,083 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4750, loss[loss=0.05665, simple_loss=0.06453, pruned_loss=0.01083, audio_tagging_loss=0.01356, over 16459.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08873, pruned_loss=0.01213, audio_tagging_loss=0.008846, over 3038557.52 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:16:41,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3558680.0, ans=0.05 2023-11-28 15:16:49,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3558746.6666666665, ans=10.0 2023-11-28 15:16:50,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3558746.6666666665, ans=0.125 2023-11-28 15:17:06,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3558813.3333333335, ans=0.0 2023-11-28 15:17:07,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3558813.3333333335, ans=0.2 2023-11-28 15:17:21,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3558946.6666666665, ans=0.125 2023-11-28 15:17:22,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3558946.6666666665, ans=0.0 2023-11-28 15:17:29,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-28 15:17:34,700 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4800, loss[loss=0.06546, simple_loss=0.09104, pruned_loss=0.009729, audio_tagging_loss=0.0102, over 16656.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.0893, pruned_loss=0.01207, audio_tagging_loss=0.008853, over 3044684.67 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:17:45,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.221e+01 9.663e+01 1.040e+02 1.234e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 15:18:11,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3559213.3333333335, ans=0.0 2023-11-28 15:18:18,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3559213.3333333335, ans=0.1 2023-11-28 15:18:26,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3559280.0, ans=0.0 2023-11-28 15:18:31,388 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-28 15:18:36,118 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4850, loss[loss=0.05795, simple_loss=0.07285, pruned_loss=0.0121, audio_tagging_loss=0.00943, over 15062.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08987, pruned_loss=0.01229, audio_tagging_loss=0.008858, over 3040601.40 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:18:46,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3559346.6666666665, ans=0.0 2023-11-28 15:19:25,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2023-11-28 15:19:26,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-11-28 15:19:26,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3559613.3333333335, ans=0.0 2023-11-28 15:19:33,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-28 15:19:36,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-28 15:19:38,540 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4900, loss[loss=0.06737, simple_loss=0.09094, pruned_loss=0.01375, audio_tagging_loss=0.008144, over 14736.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08908, pruned_loss=0.01226, audio_tagging_loss=0.00893, over 3036259.60 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:19:38,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3559680.0, ans=0.1 2023-11-28 15:19:49,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 9.026e+01 9.693e+01 1.038e+02 1.259e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 15:19:53,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3559746.6666666665, ans=0.125 2023-11-28 15:19:56,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3559746.6666666665, ans=0.2 2023-11-28 15:19:58,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3559746.6666666665, ans=0.07 2023-11-28 15:20:34,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3559946.6666666665, ans=0.125 2023-11-28 15:20:35,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-28 15:20:40,382 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4950, loss[loss=0.06324, simple_loss=0.08946, pruned_loss=0.01021, audio_tagging_loss=0.008297, over 14850.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08932, pruned_loss=0.01217, audio_tagging_loss=0.008753, over 3031164.53 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:20:47,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3560013.3333333335, ans=0.0 2023-11-28 15:20:50,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3560013.3333333335, ans=0.1 2023-11-28 15:21:00,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-28 15:21:12,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3560146.6666666665, ans=0.0 2023-11-28 15:21:20,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3560213.3333333335, ans=0.1 2023-11-28 15:21:37,522 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-28 15:21:40,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560280.0, ans=0.1 2023-11-28 15:21:42,796 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5000, loss[loss=0.07153, simple_loss=0.0928, pruned_loss=0.01544, audio_tagging_loss=0.00969, over 15719.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08993, pruned_loss=0.01226, audio_tagging_loss=0.008626, over 3037025.88 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:21:49,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-11-28 15:21:53,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.823e+01 9.586e+01 1.031e+02 1.320e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 15:21:56,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3560413.3333333335, ans=0.0 2023-11-28 15:22:13,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3560480.0, ans=0.2 2023-11-28 15:22:23,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560546.6666666665, ans=0.1 2023-11-28 15:22:31,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3560613.3333333335, ans=0.0 2023-11-28 15:22:39,797 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-28 15:22:45,169 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5050, loss[loss=0.06289, simple_loss=0.09228, pruned_loss=0.009435, audio_tagging_loss=0.007312, over 14913.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08885, pruned_loss=0.01206, audio_tagging_loss=0.008585, over 3045562.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:22:45,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3560680.0, ans=0.125 2023-11-28 15:23:01,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560746.6666666665, ans=0.1 2023-11-28 15:23:14,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3560813.3333333335, ans=0.0 2023-11-28 15:23:39,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3560946.6666666665, ans=0.125 2023-11-28 15:23:41,436 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-28 15:23:46,150 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5100, loss[loss=0.06812, simple_loss=0.09911, pruned_loss=0.01162, audio_tagging_loss=0.006947, over 15713.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08976, pruned_loss=0.01221, audio_tagging_loss=0.008555, over 3046142.25 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:23:48,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-28 15:23:58,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.022e+01 9.569e+01 1.030e+02 1.259e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:24:01,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3561080.0, ans=0.2 2023-11-28 15:24:29,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3561213.3333333335, ans=0.0 2023-11-28 15:24:43,892 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-28 15:24:45,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3561280.0, ans=0.1 2023-11-28 15:24:48,764 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5150, loss[loss=0.05952, simple_loss=0.07676, pruned_loss=0.01057, audio_tagging_loss=0.01056, over 15082.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08916, pruned_loss=0.0121, audio_tagging_loss=0.008542, over 3039504.05 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:25:08,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-11-28 15:25:34,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3561546.6666666665, ans=0.1 2023-11-28 15:25:46,184 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-28 15:25:46,454 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:25:50,866 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5200, loss[loss=0.05904, simple_loss=0.07689, pruned_loss=0.01122, audio_tagging_loss=0.009372, over 15181.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09072, pruned_loss=0.0124, audio_tagging_loss=0.008498, over 3037630.67 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:26:03,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.561e+01 9.249e+01 1.010e+02 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 15:26:19,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3561813.3333333335, ans=0.125 2023-11-28 15:26:48,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-28 15:26:53,348 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5250, loss[loss=0.07677, simple_loss=0.09662, pruned_loss=0.02068, audio_tagging_loss=0.007784, over 17104.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09071, pruned_loss=0.01244, audio_tagging_loss=0.008441, over 3038187.54 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:27:03,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3562013.3333333335, ans=0.125 2023-11-28 15:27:09,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3562080.0, ans=0.0 2023-11-28 15:27:42,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-28 15:27:50,530 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-28 15:27:55,103 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5300, loss[loss=0.06041, simple_loss=0.07742, pruned_loss=0.01209, audio_tagging_loss=0.009608, over 15546.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09063, pruned_loss=0.01244, audio_tagging_loss=0.008382, over 3034963.70 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:28:05,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3562346.6666666665, ans=0.2 2023-11-28 15:28:06,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3562413.3333333335, ans=0.1 2023-11-28 15:28:07,534 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.987e+01 9.599e+01 1.032e+02 1.281e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 15:28:11,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-28 15:28:26,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3562480.0, ans=0.125 2023-11-28 15:28:30,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3562480.0, ans=0.0 2023-11-28 15:28:30,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3562480.0, ans=0.0 2023-11-28 15:28:33,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3562546.6666666665, ans=0.09899494936611666 2023-11-28 15:28:38,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3562546.6666666665, ans=0.2 2023-11-28 15:28:52,850 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-28 15:28:57,847 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5350, loss[loss=0.06856, simple_loss=0.09866, pruned_loss=0.009853, audio_tagging_loss=0.009383, over 14882.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09069, pruned_loss=0.01238, audio_tagging_loss=0.008407, over 3035535.43 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:29:05,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2023-11-28 15:29:12,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3562746.6666666665, ans=0.1 2023-11-28 15:29:20,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3562746.6666666665, ans=0.0 2023-11-28 15:29:36,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3562880.0, ans=0.2 2023-11-28 15:29:42,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3562880.0, ans=0.125 2023-11-28 15:29:55,303 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-28 15:29:59,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3563013.3333333335, ans=0.125 2023-11-28 15:29:59,987 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5400, loss[loss=0.06559, simple_loss=0.09222, pruned_loss=0.0112, audio_tagging_loss=0.008278, over 16047.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09106, pruned_loss=0.01238, audio_tagging_loss=0.008465, over 3041309.69 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:30:04,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-28 15:30:05,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3563013.3333333335, ans=0.125 2023-11-28 15:30:13,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3563080.0, ans=0.125 2023-11-28 15:30:14,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 8.988e+01 9.532e+01 1.029e+02 1.170e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 15:30:19,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2023-11-28 15:30:52,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3563280.0, ans=0.0 2023-11-28 15:30:52,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3563280.0, ans=0.125 2023-11-28 15:30:57,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-28 15:31:02,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-28 15:31:02,810 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5450, loss[loss=0.06376, simple_loss=0.0843, pruned_loss=0.01264, audio_tagging_loss=0.008971, over 13929.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09098, pruned_loss=0.01252, audio_tagging_loss=0.008497, over 3038765.05 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:31:10,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3563346.6666666665, ans=0.0 2023-11-28 15:31:39,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-11-28 15:32:00,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-28 15:32:04,866 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5500, loss[loss=0.0561, simple_loss=0.07147, pruned_loss=0.008251, audio_tagging_loss=0.01212, over 16159.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09075, pruned_loss=0.01239, audio_tagging_loss=0.008592, over 3045930.58 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:32:13,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-28 15:32:18,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.919e+01 9.709e+01 1.036e+02 2.693e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 15:32:23,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3563746.6666666665, ans=0.0 2023-11-28 15:32:39,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3563813.3333333335, ans=0.125 2023-11-28 15:32:52,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2023-11-28 15:33:01,915 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-28 15:33:06,887 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5550, loss[loss=0.06644, simple_loss=0.08809, pruned_loss=0.01293, audio_tagging_loss=0.009465, over 15623.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09115, pruned_loss=0.01238, audio_tagging_loss=0.008654, over 3046505.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:33:08,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3564013.3333333335, ans=0.0 2023-11-28 15:33:16,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3564013.3333333335, ans=0.125 2023-11-28 15:33:29,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3564080.0, ans=0.0 2023-11-28 15:33:42,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3564146.6666666665, ans=0.0 2023-11-28 15:33:58,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3564280.0, ans=0.125 2023-11-28 15:34:03,848 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-28 15:34:09,198 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5600, loss[loss=0.07059, simple_loss=0.08512, pruned_loss=0.01927, audio_tagging_loss=0.008766, over 15194.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09049, pruned_loss=0.01214, audio_tagging_loss=0.008799, over 3050221.86 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:34:23,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.970e+01 9.641e+01 1.032e+02 1.547e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 15:34:27,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-28 15:34:27,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-28 15:34:30,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-28 15:34:37,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3564480.0, ans=0.0 2023-11-28 15:34:39,600 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:34:56,196 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:35:06,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3564613.3333333335, ans=0.07 2023-11-28 15:35:06,971 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-28 15:35:11,625 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5650, loss[loss=0.07708, simple_loss=0.1001, pruned_loss=0.01645, audio_tagging_loss=0.01056, over 16056.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09047, pruned_loss=0.01216, audio_tagging_loss=0.008891, over 3052413.79 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:35:19,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3564680.0, ans=0.95 2023-11-28 15:35:20,081 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:35:28,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-28 15:35:35,313 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:35:37,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2023-11-28 15:36:09,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-28 15:36:10,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3564946.6666666665, ans=0.1 2023-11-28 15:36:12,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3565013.3333333335, ans=0.0 2023-11-28 15:36:13,948 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5700, loss[loss=0.07721, simple_loss=0.1038, pruned_loss=0.01544, audio_tagging_loss=0.009851, over 15941.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09026, pruned_loss=0.01201, audio_tagging_loss=0.008864, over 3054674.09 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:36:16,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3565013.3333333335, ans=0.125 2023-11-28 15:36:28,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.736e+01 9.435e+01 1.001e+02 1.261e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 15:36:32,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3565080.0, ans=0.5 2023-11-28 15:36:35,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:44,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-28 15:36:47,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-28 15:36:52,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3565213.3333333335, ans=0.0 2023-11-28 15:37:02,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3565213.3333333335, ans=0.2 2023-11-28 15:37:04,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3565280.0, ans=0.125 2023-11-28 15:37:11,492 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-28 15:37:16,411 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5750, loss[loss=0.06914, simple_loss=0.09572, pruned_loss=0.01496, audio_tagging_loss=0.006325, over 14767.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08942, pruned_loss=0.01188, audio_tagging_loss=0.008746, over 3054964.72 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:37:18,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3565346.6666666665, ans=0.1 2023-11-28 15:37:39,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3565413.3333333335, ans=0.2 2023-11-28 15:37:43,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3565480.0, ans=0.125 2023-11-28 15:37:46,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3565480.0, ans=0.125 2023-11-28 15:37:47,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3565480.0, ans=0.07 2023-11-28 15:37:54,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3565546.6666666665, ans=0.0 2023-11-28 15:38:00,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3565546.6666666665, ans=0.0 2023-11-28 15:38:06,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-28 15:38:14,213 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-28 15:38:20,316 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5800, loss[loss=0.07, simple_loss=0.09887, pruned_loss=0.01346, audio_tagging_loss=0.007108, over 15039.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09009, pruned_loss=0.01213, audio_tagging_loss=0.008637, over 3059393.24 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:38:31,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3565746.6666666665, ans=0.2 2023-11-28 15:38:35,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.604e+01 9.339e+01 1.000e+02 1.373e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 15:38:40,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3565746.6666666665, ans=0.2 2023-11-28 15:38:49,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3565813.3333333335, ans=0.09899494936611666 2023-11-28 15:38:52,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3565813.3333333335, ans=0.2 2023-11-28 15:38:58,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3565880.0, ans=0.125 2023-11-28 15:39:17,045 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-28 15:39:17,248 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:39:18,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3565946.6666666665, ans=0.125 2023-11-28 15:39:22,247 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5850, loss[loss=0.06957, simple_loss=0.09739, pruned_loss=0.01127, audio_tagging_loss=0.009616, over 15201.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08979, pruned_loss=0.01205, audio_tagging_loss=0.008653, over 3047177.21 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:39:28,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3566013.3333333335, ans=0.0 2023-11-28 15:39:40,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3566080.0, ans=0.125 2023-11-28 15:40:03,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3566213.3333333335, ans=0.04949747468305833 2023-11-28 15:40:15,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3566280.0, ans=0.125 2023-11-28 15:40:19,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-28 15:40:20,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3566280.0, ans=0.025 2023-11-28 15:40:24,063 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5900, loss[loss=0.06585, simple_loss=0.08922, pruned_loss=0.01353, audio_tagging_loss=0.007715, over 15556.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08984, pruned_loss=0.01205, audio_tagging_loss=0.008583, over 3050801.55 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:40:26,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3566346.6666666665, ans=0.95 2023-11-28 15:40:30,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3566346.6666666665, ans=0.09899494936611666 2023-11-28 15:40:39,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.038e+01 9.170e+01 9.658e+01 1.028e+02 1.325e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 15:40:53,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3566480.0, ans=0.125 2023-11-28 15:41:21,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-28 15:41:21,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3566613.3333333335, ans=0.0 2023-11-28 15:41:25,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3566680.0, ans=0.125 2023-11-28 15:41:26,972 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5950, loss[loss=0.07828, simple_loss=0.1036, pruned_loss=0.01536, audio_tagging_loss=0.01112, over 16231.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08922, pruned_loss=0.01203, audio_tagging_loss=0.008644, over 3051705.47 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:41:44,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566746.6666666665, ans=0.1 2023-11-28 15:41:51,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3566813.3333333335, ans=0.125 2023-11-28 15:42:05,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3566880.0, ans=0.1 2023-11-28 15:42:24,329 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-28 15:42:29,584 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6000, loss[loss=0.05349, simple_loss=0.0681, pruned_loss=0.006959, audio_tagging_loss=0.01248, over 15543.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08814, pruned_loss=0.01191, audio_tagging_loss=0.008731, over 3049227.25 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:42:29,586 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 15:43:07,309 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05761, simple_loss=0.05049, pruned_loss=0.005188, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-28 15:43:07,309 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 15:43:22,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.761e+01 9.402e+01 1.021e+02 1.330e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 15:43:54,044 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:43:59,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3567280.0, ans=0.1 2023-11-28 15:44:04,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-28 15:44:09,587 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6050, loss[loss=0.07338, simple_loss=0.1001, pruned_loss=0.01621, audio_tagging_loss=0.007129, over 13749.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08894, pruned_loss=0.01204, audio_tagging_loss=0.00867, over 3045799.46 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:44:09,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3567346.6666666665, ans=0.125 2023-11-28 15:44:16,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3567346.6666666665, ans=0.125 2023-11-28 15:44:16,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3567346.6666666665, ans=0.2 2023-11-28 15:44:17,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.66 vs. limit=10.0 2023-11-28 15:44:37,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2023-11-28 15:44:51,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-28 15:45:07,118 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-28 15:45:12,328 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6100, loss[loss=0.07486, simple_loss=0.1061, pruned_loss=0.01594, audio_tagging_loss=0.005869, over 15003.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0891, pruned_loss=0.01216, audio_tagging_loss=0.008622, over 3048123.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:45:26,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3567746.6666666665, ans=0.2 2023-11-28 15:45:27,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.963e+01 9.572e+01 1.025e+02 1.368e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:45:27,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3567746.6666666665, ans=0.1 2023-11-28 15:45:51,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3567880.0, ans=0.2 2023-11-28 15:46:04,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3567946.6666666665, ans=0.125 2023-11-28 15:46:09,248 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-28 15:46:14,253 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6150, loss[loss=0.04906, simple_loss=0.06478, pruned_loss=0.006901, audio_tagging_loss=0.009768, over 15609.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0895, pruned_loss=0.01217, audio_tagging_loss=0.008605, over 3053514.13 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:46:28,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3568080.0, ans=0.125 2023-11-28 15:46:39,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3568146.6666666665, ans=0.125 2023-11-28 15:46:39,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=22.5 2023-11-28 15:46:52,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3568213.3333333335, ans=0.125 2023-11-28 15:46:54,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-11-28 15:47:11,738 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-28 15:47:17,029 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6200, loss[loss=0.07115, simple_loss=0.09737, pruned_loss=0.01509, audio_tagging_loss=0.007374, over 13869.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08903, pruned_loss=0.0121, audio_tagging_loss=0.008616, over 3050344.31 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:47:21,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2023-11-28 15:47:22,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-28 15:47:23,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3568346.6666666665, ans=0.0 2023-11-28 15:47:33,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 8.984e+01 9.633e+01 1.042e+02 1.273e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 15:47:38,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3568413.3333333335, ans=0.125 2023-11-28 15:48:14,124 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-28 15:48:15,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3568613.3333333335, ans=0.1 2023-11-28 15:48:19,434 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6250, loss[loss=0.0752, simple_loss=0.103, pruned_loss=0.01598, audio_tagging_loss=0.007742, over 14596.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08868, pruned_loss=0.01195, audio_tagging_loss=0.00872, over 3048515.89 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:48:19,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3568680.0, ans=0.125 2023-11-28 15:48:20,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3568680.0, ans=0.2 2023-11-28 15:48:34,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3568746.6666666665, ans=0.1 2023-11-28 15:48:36,730 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:48:40,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3568746.6666666665, ans=0.125 2023-11-28 15:48:43,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3568813.3333333335, ans=0.1 2023-11-28 15:49:06,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3568880.0, ans=0.2 2023-11-28 15:49:14,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2023-11-28 15:49:16,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.49 vs. limit=10.0 2023-11-28 15:49:16,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-28 15:49:21,434 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6300, loss[loss=0.05862, simple_loss=0.0739, pruned_loss=0.009278, audio_tagging_loss=0.01239, over 14123.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.0884, pruned_loss=0.01186, audio_tagging_loss=0.00891, over 3044159.20 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:49:35,291 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:49:38,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.816e+01 9.438e+01 1.010e+02 1.307e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 15:49:46,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-28 15:50:03,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2023-11-28 15:50:06,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3569213.3333333335, ans=0.125 2023-11-28 15:50:18,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3569280.0, ans=0.125 2023-11-28 15:50:19,281 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-28 15:50:24,766 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6350, loss[loss=0.06226, simple_loss=0.08656, pruned_loss=0.01065, audio_tagging_loss=0.008328, over 15197.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0886, pruned_loss=0.01201, audio_tagging_loss=0.008921, over 3042037.98 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:50:26,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2023-11-28 15:50:44,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3569413.3333333335, ans=0.125 2023-11-28 15:51:21,821 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-28 15:51:26,562 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6400, loss[loss=0.05473, simple_loss=0.0788, pruned_loss=0.007313, audio_tagging_loss=0.008022, over 14598.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08862, pruned_loss=0.0121, audio_tagging_loss=0.008918, over 3040365.54 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:51:43,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.811e+01 9.428e+01 1.008e+02 1.163e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 15:52:07,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=22.5 2023-11-28 15:52:24,890 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-28 15:52:28,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-11-28 15:52:30,151 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6450, loss[loss=0.0685, simple_loss=0.09736, pruned_loss=0.01142, audio_tagging_loss=0.008396, over 16016.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08883, pruned_loss=0.01202, audio_tagging_loss=0.008965, over 3041693.76 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:52:51,964 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:52:53,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3570080.0, ans=0.125 2023-11-28 15:53:06,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570213.3333333335, ans=0.1 2023-11-28 15:53:14,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570213.3333333335, ans=0.1 2023-11-28 15:53:20,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3570280.0, ans=0.2 2023-11-28 15:53:23,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-11-28 15:53:28,098 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-28 15:53:32,778 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6500, loss[loss=0.06723, simple_loss=0.08457, pruned_loss=0.01299, audio_tagging_loss=0.01196, over 16711.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.089, pruned_loss=0.01197, audio_tagging_loss=0.008977, over 3054040.39 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:53:42,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3570346.6666666665, ans=0.125 2023-11-28 15:53:45,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3570413.3333333335, ans=0.0 2023-11-28 15:53:48,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 8.856e+01 9.321e+01 9.995e+01 1.237e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 15:54:09,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3570546.6666666665, ans=0.125 2023-11-28 15:54:10,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3570546.6666666665, ans=0.125 2023-11-28 15:54:11,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-28 15:54:30,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-28 15:54:35,612 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6550, loss[loss=0.0821, simple_loss=0.1176, pruned_loss=0.01809, audio_tagging_loss=0.005203, over 16190.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01211, audio_tagging_loss=0.008723, over 3055108.63 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:54:40,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-28 15:55:10,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3570813.3333333335, ans=0.0 2023-11-28 15:55:11,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3570813.3333333335, ans=0.0 2023-11-28 15:55:18,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3570880.0, ans=0.125 2023-11-28 15:55:33,435 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-28 15:55:38,034 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6600, loss[loss=0.06177, simple_loss=0.08296, pruned_loss=0.01221, audio_tagging_loss=0.00808, over 15150.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08881, pruned_loss=0.01208, audio_tagging_loss=0.008688, over 3051382.21 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:55:55,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.913e+01 9.644e+01 1.048e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 15:55:56,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-28 15:56:04,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3571146.6666666665, ans=0.0 2023-11-28 15:56:26,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3571280.0, ans=0.1 2023-11-28 15:56:34,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-28 15:56:40,922 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6650, loss[loss=0.07095, simple_loss=0.1018, pruned_loss=0.01497, audio_tagging_loss=0.00506, over 15740.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08908, pruned_loss=0.01222, audio_tagging_loss=0.008595, over 3057944.62 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:56:45,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.54 vs. limit=10.0 2023-11-28 15:56:46,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-11-28 15:56:51,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3571346.6666666665, ans=0.1 2023-11-28 15:56:55,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3571413.3333333335, ans=0.125 2023-11-28 15:57:23,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-28 15:57:25,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3571546.6666666665, ans=0.125 2023-11-28 15:57:32,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3571613.3333333335, ans=0.1 2023-11-28 15:57:32,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-28 15:57:38,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-28 15:57:39,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-28 15:57:42,789 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6700, loss[loss=0.06178, simple_loss=0.08899, pruned_loss=0.00768, audio_tagging_loss=0.009604, over 15975.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0888, pruned_loss=0.01223, audio_tagging_loss=0.008583, over 3046898.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:57:49,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3571680.0, ans=0.0 2023-11-28 15:57:59,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.880e+01 9.649e+01 1.029e+02 1.368e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 15:58:00,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=22.5 2023-11-28 15:58:19,673 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:58:20,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3571880.0, ans=0.2 2023-11-28 15:58:39,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-28 15:58:45,029 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6750, loss[loss=0.07294, simple_loss=0.09971, pruned_loss=0.01333, audio_tagging_loss=0.009754, over 15844.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08879, pruned_loss=0.01223, audio_tagging_loss=0.008638, over 3051208.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:59:09,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3572146.6666666665, ans=0.04949747468305833 2023-11-28 15:59:42,171 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-28 15:59:43,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3572280.0, ans=0.0 2023-11-28 15:59:47,483 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6800, loss[loss=0.06631, simple_loss=0.08955, pruned_loss=0.01371, audio_tagging_loss=0.00783, over 15221.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08913, pruned_loss=0.01229, audio_tagging_loss=0.008587, over 3056382.65 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:59:50,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3572346.6666666665, ans=0.125 2023-11-28 16:00:04,698 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.079e+01 9.608e+01 1.020e+02 1.284e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 16:00:09,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3572413.3333333335, ans=0.025 2023-11-28 16:00:26,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3572546.6666666665, ans=0.0 2023-11-28 16:00:29,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3572546.6666666665, ans=0.0 2023-11-28 16:00:45,830 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-28 16:00:49,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3572680.0, ans=0.2 2023-11-28 16:00:50,602 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6850, loss[loss=0.08301, simple_loss=0.1113, pruned_loss=0.01695, audio_tagging_loss=0.01039, over 14526.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09002, pruned_loss=0.01253, audio_tagging_loss=0.008522, over 3053776.10 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:00:53,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3572680.0, ans=0.2 2023-11-28 16:01:01,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3572746.6666666665, ans=0.07 2023-11-28 16:01:15,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3572813.3333333335, ans=0.125 2023-11-28 16:01:17,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3572813.3333333335, ans=0.125 2023-11-28 16:01:25,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3572880.0, ans=0.1 2023-11-28 16:01:46,894 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-28 16:01:52,234 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6900, loss[loss=0.05318, simple_loss=0.06588, pruned_loss=0.01267, audio_tagging_loss=0.007571, over 13508.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09001, pruned_loss=0.01248, audio_tagging_loss=0.008588, over 3046929.03 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:02:10,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.778e+01 9.481e+01 1.016e+02 1.477e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 16:02:18,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3573146.6666666665, ans=0.125 2023-11-28 16:02:28,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3573146.6666666665, ans=0.1 2023-11-28 16:02:29,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2023-11-28 16:02:45,209 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:02:49,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3573280.0, ans=0.125 2023-11-28 16:02:51,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-28 16:02:57,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3573346.6666666665, ans=0.2 2023-11-28 16:02:58,690 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6950, loss[loss=0.07365, simple_loss=0.1064, pruned_loss=0.01353, audio_tagging_loss=0.006935, over 15059.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09001, pruned_loss=0.0124, audio_tagging_loss=0.008591, over 3046720.20 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:03:05,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.63 vs. limit=6.0 2023-11-28 16:03:11,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573413.3333333335, ans=0.1 2023-11-28 16:03:14,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3573413.3333333335, ans=0.0 2023-11-28 16:03:15,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573413.3333333335, ans=0.1 2023-11-28 16:03:56,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-28 16:04:01,248 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7000, loss[loss=0.07653, simple_loss=0.1095, pruned_loss=0.01597, audio_tagging_loss=0.005812, over 14835.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08993, pruned_loss=0.01234, audio_tagging_loss=0.008543, over 3041529.75 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:04:03,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-28 16:04:04,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-28 16:04:06,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3573680.0, ans=0.125 2023-11-28 16:04:10,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3573680.0, ans=0.125 2023-11-28 16:04:12,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573746.6666666665, ans=0.1 2023-11-28 16:04:18,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.731e+01 9.457e+01 1.025e+02 1.203e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 16:04:29,904 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:04:40,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3573880.0, ans=0.1 2023-11-28 16:04:47,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3573880.0, ans=0.0 2023-11-28 16:04:58,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-28 16:05:03,921 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7050, loss[loss=0.06151, simple_loss=0.07306, pruned_loss=0.01249, audio_tagging_loss=0.01249, over 14493.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09001, pruned_loss=0.01237, audio_tagging_loss=0.008694, over 3051775.95 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:05:05,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3574013.3333333335, ans=0.125 2023-11-28 16:05:10,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-28 16:05:19,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3574080.0, ans=0.0 2023-11-28 16:05:25,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-11-28 16:05:33,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3574146.6666666665, ans=0.125 2023-11-28 16:05:45,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3574213.3333333335, ans=0.125 2023-11-28 16:05:50,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3574213.3333333335, ans=0.125 2023-11-28 16:05:52,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3574280.0, ans=0.125 2023-11-28 16:06:00,983 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-28 16:06:05,768 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7100, loss[loss=0.05515, simple_loss=0.07827, pruned_loss=0.008424, audio_tagging_loss=0.007587, over 14443.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09047, pruned_loss=0.0123, audio_tagging_loss=0.008783, over 3060221.95 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:06:10,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3574346.6666666665, ans=0.125 2023-11-28 16:06:13,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3574346.6666666665, ans=0.2 2023-11-28 16:06:21,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3574413.3333333335, ans=0.1 2023-11-28 16:06:23,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 8.973e+01 9.710e+01 1.043e+02 1.342e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 16:06:45,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3574546.6666666665, ans=0.0 2023-11-28 16:06:59,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3574613.3333333335, ans=0.125 2023-11-28 16:07:04,279 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-28 16:07:08,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3574680.0, ans=0.125 2023-11-28 16:07:09,598 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7150, loss[loss=0.0841, simple_loss=0.1232, pruned_loss=0.01661, audio_tagging_loss=0.005868, over 15520.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09011, pruned_loss=0.01222, audio_tagging_loss=0.00883, over 3047380.23 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:07:52,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3574880.0, ans=0.125 2023-11-28 16:08:07,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-28 16:08:10,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3574946.6666666665, ans=0.1 2023-11-28 16:08:12,286 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7200, loss[loss=0.07407, simple_loss=0.1027, pruned_loss=0.01447, audio_tagging_loss=0.008279, over 16081.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09143, pruned_loss=0.01241, audio_tagging_loss=0.008691, over 3042368.40 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:08:22,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3575013.3333333335, ans=0.0 2023-11-28 16:08:29,530 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.876e+01 9.486e+01 1.031e+02 1.370e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 16:08:29,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3575080.0, ans=0.125 2023-11-28 16:09:10,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-28 16:09:14,984 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7250, loss[loss=0.05653, simple_loss=0.06333, pruned_loss=0.01118, audio_tagging_loss=0.01369, over 15059.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09115, pruned_loss=0.01234, audio_tagging_loss=0.008776, over 3039579.19 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:09:18,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3575346.6666666665, ans=0.1 2023-11-28 16:09:32,975 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:09:42,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3575480.0, ans=0.0 2023-11-28 16:10:10,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3575613.3333333335, ans=0.0 2023-11-28 16:10:11,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3575613.3333333335, ans=0.125 2023-11-28 16:10:12,450 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-28 16:10:17,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3575680.0, ans=15.0 2023-11-28 16:10:17,732 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7300, loss[loss=0.08199, simple_loss=0.1132, pruned_loss=0.01613, audio_tagging_loss=0.009281, over 16363.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09093, pruned_loss=0.01233, audio_tagging_loss=0.008754, over 3040144.22 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:10:20,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3575680.0, ans=0.125 2023-11-28 16:10:31,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-28 16:10:34,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.874e+01 9.514e+01 1.009e+02 1.390e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:10:43,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3575813.3333333335, ans=0.1 2023-11-28 16:10:43,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3575813.3333333335, ans=0.0 2023-11-28 16:10:45,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3575813.3333333335, ans=0.0 2023-11-28 16:10:57,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-28 16:10:59,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3575880.0, ans=0.05 2023-11-28 16:10:59,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2023-11-28 16:11:14,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-28 16:11:19,285 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7350, loss[loss=0.06014, simple_loss=0.07477, pruned_loss=0.01323, audio_tagging_loss=0.009524, over 14819.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08991, pruned_loss=0.01228, audio_tagging_loss=0.008682, over 3048805.65 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:11:21,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3576013.3333333335, ans=0.0 2023-11-28 16:11:34,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3576080.0, ans=0.0 2023-11-28 16:11:43,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3576146.6666666665, ans=0.125 2023-11-28 16:12:03,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3576213.3333333335, ans=0.0 2023-11-28 16:12:16,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3576280.0, ans=0.0 2023-11-28 16:12:17,244 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-28 16:12:22,024 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7400, loss[loss=0.06522, simple_loss=0.0889, pruned_loss=0.0125, audio_tagging_loss=0.008269, over 14711.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08992, pruned_loss=0.01228, audio_tagging_loss=0.008717, over 3040539.28 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:12:32,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2023-11-28 16:12:41,961 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.849e+01 9.470e+01 1.002e+02 1.238e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 16:13:08,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3576546.6666666665, ans=0.1 2023-11-28 16:13:11,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-28 16:13:12,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-28 16:13:18,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-28 16:13:19,318 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-28 16:13:23,927 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7450, loss[loss=0.04866, simple_loss=0.06348, pruned_loss=0.005505, audio_tagging_loss=0.01141, over 14906.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08999, pruned_loss=0.0124, audio_tagging_loss=0.008695, over 3034074.76 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:13:26,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2023-11-28 16:13:35,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-28 16:13:37,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3576746.6666666665, ans=0.025 2023-11-28 16:13:59,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3576813.3333333335, ans=0.0 2023-11-28 16:14:07,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3576880.0, ans=0.0 2023-11-28 16:14:16,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3576946.6666666665, ans=0.125 2023-11-28 16:14:19,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3576946.6666666665, ans=0.125 2023-11-28 16:14:20,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3576946.6666666665, ans=0.125 2023-11-28 16:14:21,952 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-28 16:14:22,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3576946.6666666665, ans=0.0 2023-11-28 16:14:26,694 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7500, loss[loss=0.07719, simple_loss=0.1086, pruned_loss=0.01463, audio_tagging_loss=0.008258, over 15042.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09042, pruned_loss=0.01241, audio_tagging_loss=0.008631, over 3041293.94 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:14:30,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3577013.3333333335, ans=0.125 2023-11-28 16:14:47,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.969e+01 9.602e+01 1.048e+02 1.798e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:15:17,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-28 16:15:24,944 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-28 16:15:29,960 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7550, loss[loss=0.06586, simple_loss=0.09157, pruned_loss=0.01157, audio_tagging_loss=0.0085, over 15178.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08963, pruned_loss=0.01234, audio_tagging_loss=0.008639, over 3046312.66 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:15:37,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-28 16:15:53,245 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:16:27,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-28 16:16:28,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3577613.3333333335, ans=0.0 2023-11-28 16:16:32,314 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7600, loss[loss=0.06631, simple_loss=0.09712, pruned_loss=0.0103, audio_tagging_loss=0.007443, over 15085.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08924, pruned_loss=0.01225, audio_tagging_loss=0.008564, over 3041469.12 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:16:37,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=12.0 2023-11-28 16:16:40,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3577680.0, ans=0.0 2023-11-28 16:16:46,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3577746.6666666665, ans=0.0 2023-11-28 16:16:51,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.772e+01 9.466e+01 1.004e+02 1.199e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 16:17:30,281 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-28 16:17:34,909 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7650, loss[loss=0.09134, simple_loss=0.1239, pruned_loss=0.02096, audio_tagging_loss=0.008409, over 15588.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09012, pruned_loss=0.01237, audio_tagging_loss=0.008469, over 3046654.79 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:17:41,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3578013.3333333335, ans=0.0 2023-11-28 16:17:41,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-28 16:17:53,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3578080.0, ans=0.125 2023-11-28 16:17:53,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3578080.0, ans=0.125 2023-11-28 16:17:55,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3578080.0, ans=0.125 2023-11-28 16:17:59,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=22.5 2023-11-28 16:18:03,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578146.6666666665, ans=0.1 2023-11-28 16:18:14,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-11-28 16:18:26,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3578280.0, ans=0.125 2023-11-28 16:18:32,213 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-28 16:18:37,465 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7700, loss[loss=0.07041, simple_loss=0.101, pruned_loss=0.01247, audio_tagging_loss=0.007462, over 15052.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08977, pruned_loss=0.01222, audio_tagging_loss=0.008528, over 3047373.36 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:18:47,929 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:18:57,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.045e+01 9.681e+01 1.026e+02 1.409e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 16:19:19,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3578546.6666666665, ans=10.0 2023-11-28 16:19:33,913 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-28 16:19:39,976 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7750, loss[loss=0.07065, simple_loss=0.1048, pruned_loss=0.01104, audio_tagging_loss=0.007212, over 15794.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0896, pruned_loss=0.01218, audio_tagging_loss=0.008566, over 3042742.62 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:19:40,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-28 16:19:43,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3578680.0, ans=0.125 2023-11-28 16:19:57,467 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:20:07,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-28 16:20:09,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3578813.3333333335, ans=6.0 2023-11-28 16:20:31,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3578946.6666666665, ans=0.0 2023-11-28 16:20:35,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3578946.6666666665, ans=0.125 2023-11-28 16:20:37,916 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-28 16:20:42,603 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7800, loss[loss=0.05957, simple_loss=0.07773, pruned_loss=0.01113, audio_tagging_loss=0.009572, over 15367.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08923, pruned_loss=0.01209, audio_tagging_loss=0.008636, over 3040757.18 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:20:45,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3579013.3333333335, ans=0.0 2023-11-28 16:21:01,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 9.025e+01 9.590e+01 1.021e+02 1.203e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:21:04,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-28 16:21:25,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3579213.3333333335, ans=0.125 2023-11-28 16:21:38,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-28 16:21:43,994 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7850, loss[loss=0.06588, simple_loss=0.09164, pruned_loss=0.01165, audio_tagging_loss=0.008405, over 15336.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09065, pruned_loss=0.01239, audio_tagging_loss=0.008596, over 3042850.35 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:22:04,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-28 16:22:28,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3579546.6666666665, ans=0.1 2023-11-28 16:22:40,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-28 16:22:45,246 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7900, loss[loss=0.06701, simple_loss=0.09449, pruned_loss=0.009394, audio_tagging_loss=0.01037, over 16126.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09049, pruned_loss=0.01223, audio_tagging_loss=0.00871, over 3051829.49 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:05,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.028e+01 9.515e+01 1.023e+02 1.434e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:23:14,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3579813.3333333335, ans=0.125 2023-11-28 16:23:19,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3579813.3333333335, ans=0.125 2023-11-28 16:23:35,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3579946.6666666665, ans=0.0 2023-11-28 16:23:43,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-28 16:23:45,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2023-11-28 16:23:48,710 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7950, loss[loss=0.05993, simple_loss=0.07618, pruned_loss=0.01071, audio_tagging_loss=0.01113, over 14558.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08968, pruned_loss=0.01213, audio_tagging_loss=0.008803, over 3053213.16 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:49,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-28 16:23:54,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-28 16:24:03,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2023-11-28 16:24:06,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2023-11-28 16:24:07,197 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:24:08,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3580080.0, ans=0.125 2023-11-28 16:24:10,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-28 16:24:16,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3580146.6666666665, ans=0.0 2023-11-28 16:24:20,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3580146.6666666665, ans=0.09899494936611666 2023-11-28 16:24:29,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3580213.3333333335, ans=0.0 2023-11-28 16:24:45,607 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-28 16:24:48,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3580280.0, ans=0.2 2023-11-28 16:24:50,111 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8000, loss[loss=0.06165, simple_loss=0.07878, pruned_loss=0.01403, audio_tagging_loss=0.008227, over 16001.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08833, pruned_loss=0.01191, audio_tagging_loss=0.009007, over 3051219.10 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:25:11,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.789e+01 9.588e+01 1.026e+02 1.289e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:25:21,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-11-28 16:25:27,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3580546.6666666665, ans=0.04949747468305833 2023-11-28 16:25:29,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3580546.6666666665, ans=0.125 2023-11-28 16:25:41,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3580613.3333333335, ans=0.125 2023-11-28 16:25:47,481 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-28 16:25:52,280 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8050, loss[loss=0.07713, simple_loss=0.1074, pruned_loss=0.01697, audio_tagging_loss=0.006447, over 15701.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08854, pruned_loss=0.01193, audio_tagging_loss=0.008998, over 3046899.64 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:26:35,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3580880.0, ans=0.125 2023-11-28 16:26:40,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3580946.6666666665, ans=0.1 2023-11-28 16:26:48,845 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-28 16:26:54,651 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8100, loss[loss=0.05698, simple_loss=0.07542, pruned_loss=0.01206, audio_tagging_loss=0.0072, over 14458.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08962, pruned_loss=0.01207, audio_tagging_loss=0.008872, over 3049493.59 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:26:58,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3581013.3333333335, ans=0.125 2023-11-28 16:27:09,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3581080.0, ans=0.125 2023-11-28 16:27:18,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.911e+01 9.512e+01 1.026e+02 1.565e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 16:27:32,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3581213.3333333335, ans=0.125 2023-11-28 16:27:36,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-11-28 16:27:53,977 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-28 16:27:58,923 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8150, loss[loss=0.04767, simple_loss=0.06416, pruned_loss=0.006112, audio_tagging_loss=0.009476, over 15070.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08913, pruned_loss=0.01194, audio_tagging_loss=0.008775, over 3054025.09 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:28:34,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3581480.0, ans=0.125 2023-11-28 16:28:35,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-11-28 16:28:50,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2023-11-28 16:28:56,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-28 16:28:59,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-28 16:29:01,123 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8200, loss[loss=0.07074, simple_loss=0.09677, pruned_loss=0.01366, audio_tagging_loss=0.008698, over 16040.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08975, pruned_loss=0.01208, audio_tagging_loss=0.008639, over 3052694.98 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:29:05,705 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:29:13,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3581746.6666666665, ans=0.125 2023-11-28 16:29:23,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.010e+01 9.552e+01 1.037e+02 1.390e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:29:40,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2023-11-28 16:29:47,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3581880.0, ans=0.125 2023-11-28 16:29:47,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3581880.0, ans=0.125 2023-11-28 16:29:58,240 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-28 16:30:00,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3581946.6666666665, ans=0.125 2023-11-28 16:30:02,792 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8250, loss[loss=0.06881, simple_loss=0.09762, pruned_loss=0.01366, audio_tagging_loss=0.006337, over 14839.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08907, pruned_loss=0.01199, audio_tagging_loss=0.008649, over 3044867.87 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:30:13,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3582013.3333333335, ans=0.0 2023-11-28 16:30:29,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3582146.6666666665, ans=0.125 2023-11-28 16:30:32,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3582146.6666666665, ans=0.0 2023-11-28 16:30:37,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-11-28 16:30:56,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3582280.0, ans=0.2 2023-11-28 16:30:56,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-28 16:31:00,847 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-28 16:31:06,252 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8300, loss[loss=0.05561, simple_loss=0.07608, pruned_loss=0.009493, audio_tagging_loss=0.008077, over 14626.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08982, pruned_loss=0.01211, audio_tagging_loss=0.008554, over 3050089.92 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:31:19,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3582413.3333333335, ans=0.2 2023-11-28 16:31:26,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3582413.3333333335, ans=0.125 2023-11-28 16:31:28,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.841e+01 9.438e+01 1.013e+02 1.279e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 16:31:34,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3582480.0, ans=0.1 2023-11-28 16:31:49,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-28 16:31:53,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3582546.6666666665, ans=0.125 2023-11-28 16:31:54,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3582546.6666666665, ans=0.125 2023-11-28 16:31:57,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-28 16:31:58,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-28 16:32:03,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-28 16:32:08,982 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8350, loss[loss=0.07858, simple_loss=0.1147, pruned_loss=0.01522, audio_tagging_loss=0.006011, over 16053.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08921, pruned_loss=0.0119, audio_tagging_loss=0.008553, over 3044421.09 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:32:11,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3582680.0, ans=0.125 2023-11-28 16:32:29,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3582746.6666666665, ans=0.125 2023-11-28 16:32:29,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2023-11-28 16:32:30,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3582746.6666666665, ans=0.125 2023-11-28 16:32:45,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582880.0, ans=0.1 2023-11-28 16:33:00,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3582946.6666666665, ans=0.2 2023-11-28 16:33:00,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3582946.6666666665, ans=0.0 2023-11-28 16:33:06,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-28 16:33:11,435 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8400, loss[loss=0.06135, simple_loss=0.08232, pruned_loss=0.009443, audio_tagging_loss=0.01075, over 15164.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08862, pruned_loss=0.01188, audio_tagging_loss=0.008528, over 3046594.67 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:33:17,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3583013.3333333335, ans=0.125 2023-11-28 16:33:27,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3583080.0, ans=0.125 2023-11-28 16:33:31,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.81 vs. limit=10.0 2023-11-28 16:33:32,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3583080.0, ans=0.125 2023-11-28 16:33:34,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.936e+01 9.656e+01 1.030e+02 3.353e+02, threshold=1.931e+02, percent-clipped=1.0 2023-11-28 16:33:34,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3583080.0, ans=0.125 2023-11-28 16:33:41,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3583146.6666666665, ans=0.2 2023-11-28 16:33:51,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-28 16:34:09,834 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-28 16:34:14,415 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8450, loss[loss=0.0736, simple_loss=0.09917, pruned_loss=0.01549, audio_tagging_loss=0.008522, over 14807.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08837, pruned_loss=0.01186, audio_tagging_loss=0.008509, over 3042948.98 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:34:16,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=22.5 2023-11-28 16:34:49,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3583480.0, ans=0.125 2023-11-28 16:35:10,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3583613.3333333335, ans=0.0 2023-11-28 16:35:10,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3583613.3333333335, ans=0.2 2023-11-28 16:35:13,096 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-28 16:35:17,701 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8500, loss[loss=0.07084, simple_loss=0.09662, pruned_loss=0.01275, audio_tagging_loss=0.009779, over 14567.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08919, pruned_loss=0.01189, audio_tagging_loss=0.00855, over 3047153.40 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:35:18,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3583680.0, ans=0.125 2023-11-28 16:35:35,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-28 16:35:40,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.912e+01 9.575e+01 1.015e+02 1.303e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:35:41,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-28 16:35:43,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3583813.3333333335, ans=0.07 2023-11-28 16:35:48,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3583813.3333333335, ans=0.1 2023-11-28 16:35:49,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3583813.3333333335, ans=0.125 2023-11-28 16:36:04,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-28 16:36:14,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-28 16:36:19,721 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8550, loss[loss=0.04878, simple_loss=0.06475, pruned_loss=0.008008, audio_tagging_loss=0.008394, over 14608.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08908, pruned_loss=0.01195, audio_tagging_loss=0.008614, over 3052818.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:36:22,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3584013.3333333335, ans=0.0 2023-11-28 16:36:25,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3584013.3333333335, ans=0.95 2023-11-28 16:36:45,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3584146.6666666665, ans=0.025 2023-11-28 16:36:59,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3584213.3333333335, ans=0.0 2023-11-28 16:37:16,993 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-28 16:37:19,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3584280.0, ans=0.0 2023-11-28 16:37:21,650 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8600, loss[loss=0.04385, simple_loss=0.05564, pruned_loss=0.007171, audio_tagging_loss=0.008856, over 14401.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08905, pruned_loss=0.012, audio_tagging_loss=0.008704, over 3049231.22 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:37:44,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 8.790e+01 9.576e+01 1.011e+02 1.183e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:37:45,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3584480.0, ans=0.125 2023-11-28 16:37:48,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3584480.0, ans=0.09899494936611666 2023-11-28 16:37:53,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3584480.0, ans=0.04949747468305833 2023-11-28 16:38:02,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=15.0 2023-11-28 16:38:03,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3584546.6666666665, ans=0.1 2023-11-28 16:38:10,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3584613.3333333335, ans=0.125 2023-11-28 16:38:15,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3584613.3333333335, ans=0.2 2023-11-28 16:38:18,644 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-28 16:38:22,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.16 vs. limit=22.5 2023-11-28 16:38:23,728 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8650, loss[loss=0.05567, simple_loss=0.07336, pruned_loss=0.009043, audio_tagging_loss=0.009944, over 14525.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0893, pruned_loss=0.01206, audio_tagging_loss=0.008759, over 3050360.32 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:38:25,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3584680.0, ans=0.0 2023-11-28 16:38:29,501 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:38:32,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2023-11-28 16:38:40,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3584746.6666666665, ans=0.2 2023-11-28 16:38:45,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3584746.6666666665, ans=0.0 2023-11-28 16:38:56,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3584813.3333333335, ans=0.0 2023-11-28 16:38:57,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3584813.3333333335, ans=0.0 2023-11-28 16:39:15,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3584946.6666666665, ans=0.125 2023-11-28 16:39:21,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-28 16:39:26,598 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8700, loss[loss=0.07, simple_loss=0.09788, pruned_loss=0.01205, audio_tagging_loss=0.009008, over 15186.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08941, pruned_loss=0.01199, audio_tagging_loss=0.008801, over 3047143.01 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:39:48,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.145e+01 9.850e+01 1.054e+02 1.476e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-28 16:39:49,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2023-11-28 16:39:49,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2023-11-28 16:39:52,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3585146.6666666665, ans=0.2 2023-11-28 16:40:10,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3585213.3333333335, ans=10.0 2023-11-28 16:40:20,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3585280.0, ans=0.0 2023-11-28 16:40:24,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-28 16:40:29,353 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8750, loss[loss=0.0861, simple_loss=0.1326, pruned_loss=0.01564, audio_tagging_loss=0.00416, over 14882.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08949, pruned_loss=0.01198, audio_tagging_loss=0.008847, over 3046701.20 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:40:47,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3585413.3333333335, ans=0.125 2023-11-28 16:41:12,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3585546.6666666665, ans=0.0 2023-11-28 16:41:21,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3585613.3333333335, ans=0.2 2023-11-28 16:41:26,407 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-28 16:41:31,152 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8800, loss[loss=0.0752, simple_loss=0.1007, pruned_loss=0.01631, audio_tagging_loss=0.008564, over 14903.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09001, pruned_loss=0.01204, audio_tagging_loss=0.008927, over 3045925.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:41:31,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3585680.0, ans=0.125 2023-11-28 16:41:31,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2023-11-28 16:41:47,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3585746.6666666665, ans=0.0 2023-11-28 16:41:54,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.010e+01 9.598e+01 1.030e+02 1.176e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:42:05,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-28 16:42:05,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3585813.3333333335, ans=0.125 2023-11-28 16:42:28,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-28 16:42:34,094 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8850, loss[loss=0.07589, simple_loss=0.1033, pruned_loss=0.01546, audio_tagging_loss=0.008766, over 15914.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09018, pruned_loss=0.01201, audio_tagging_loss=0.008943, over 3054245.89 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:42:38,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3586013.3333333335, ans=0.0 2023-11-28 16:42:43,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-11-28 16:42:50,881 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:43:03,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3586146.6666666665, ans=22.5 2023-11-28 16:43:24,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586280.0, ans=0.1 2023-11-28 16:43:31,357 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-28 16:43:34,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-28 16:43:36,689 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8900, loss[loss=0.06678, simple_loss=0.0823, pruned_loss=0.01655, audio_tagging_loss=0.009075, over 14994.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09062, pruned_loss=0.0122, audio_tagging_loss=0.008799, over 3048563.82 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:43:44,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3586346.6666666665, ans=0.0 2023-11-28 16:43:49,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-28 16:43:52,642 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:43:59,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.009e+01 9.604e+01 1.041e+02 1.260e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 16:44:10,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3586480.0, ans=0.0 2023-11-28 16:44:22,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3586546.6666666665, ans=0.5 2023-11-28 16:44:23,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-28 16:44:29,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-28 16:44:33,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3586613.3333333335, ans=0.07 2023-11-28 16:44:33,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-28 16:44:38,993 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8950, loss[loss=0.082, simple_loss=0.1079, pruned_loss=0.02203, audio_tagging_loss=0.006004, over 15641.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09118, pruned_loss=0.01246, audio_tagging_loss=0.008603, over 3054796.24 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:44:43,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2023-11-28 16:44:50,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3586746.6666666665, ans=0.125 2023-11-28 16:44:52,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-28 16:45:37,212 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-28 16:45:41,939 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9000, loss[loss=0.07776, simple_loss=0.1096, pruned_loss=0.01505, audio_tagging_loss=0.007904, over 14976.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09019, pruned_loss=0.01233, audio_tagging_loss=0.008611, over 3053164.79 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:45:41,940 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 16:46:23,737 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05837, simple_loss=0.05051, pruned_loss=0.005241, audio_tagging_loss=0.02788, over 4681554.00 frames. 2023-11-28 16:46:23,738 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 16:46:32,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3587013.3333333335, ans=0.1 2023-11-28 16:46:46,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.988e+01 9.549e+01 1.029e+02 1.340e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:46:57,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3587146.6666666665, ans=0.2 2023-11-28 16:47:02,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3587213.3333333335, ans=0.2 2023-11-28 16:47:20,986 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-28 16:47:26,318 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9050, loss[loss=0.06573, simple_loss=0.08526, pruned_loss=0.01475, audio_tagging_loss=0.00835, over 15265.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09088, pruned_loss=0.01237, audio_tagging_loss=0.008522, over 3051270.26 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:47:45,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2023-11-28 16:48:06,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3587546.6666666665, ans=0.1 2023-11-28 16:48:22,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3587613.3333333335, ans=0.125 2023-11-28 16:48:23,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-28 16:48:28,191 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9100, loss[loss=0.05911, simple_loss=0.07586, pruned_loss=0.01225, audio_tagging_loss=0.008935, over 14816.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09076, pruned_loss=0.01225, audio_tagging_loss=0.008551, over 3050121.15 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:48:52,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.776e+01 9.450e+01 1.014e+02 1.425e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 16:49:09,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-28 16:49:26,133 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-28 16:49:30,974 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9150, loss[loss=0.0635, simple_loss=0.08887, pruned_loss=0.01004, audio_tagging_loss=0.009026, over 15499.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09082, pruned_loss=0.01215, audio_tagging_loss=0.008474, over 3044943.51 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:49:33,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3588013.3333333335, ans=0.125 2023-11-28 16:49:41,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3588013.3333333335, ans=0.125 2023-11-28 16:49:54,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.43 vs. limit=15.0 2023-11-28 16:50:05,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2023-11-28 16:50:08,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3588213.3333333335, ans=0.0 2023-11-28 16:50:21,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3588280.0, ans=0.2 2023-11-28 16:50:28,680 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-28 16:50:33,270 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9200, loss[loss=0.06544, simple_loss=0.08824, pruned_loss=0.01255, audio_tagging_loss=0.00877, over 16245.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09046, pruned_loss=0.01209, audio_tagging_loss=0.008446, over 3050828.82 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:50:56,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.083e+01 9.538e+01 1.018e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 16:51:18,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3588546.6666666665, ans=0.5 2023-11-28 16:51:29,992 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-28 16:51:35,111 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9250, loss[loss=0.05051, simple_loss=0.06677, pruned_loss=0.00779, audio_tagging_loss=0.009336, over 16365.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08989, pruned_loss=0.012, audio_tagging_loss=0.008428, over 3053902.83 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:51:48,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-28 16:51:57,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3588746.6666666665, ans=0.02 2023-11-28 16:52:19,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-28 16:52:30,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3588946.6666666665, ans=0.125 2023-11-28 16:52:34,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-28 16:52:39,055 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9300, loss[loss=0.04174, simple_loss=0.05242, pruned_loss=0.006745, audio_tagging_loss=0.008778, over 14504.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08927, pruned_loss=0.012, audio_tagging_loss=0.008547, over 3053751.94 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:52:42,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3589013.3333333335, ans=0.1 2023-11-28 16:52:58,383 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:52:58,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3589080.0, ans=0.0 2023-11-28 16:53:03,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.801e+01 8.822e+01 9.388e+01 1.037e+02 1.623e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 16:53:17,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3589213.3333333335, ans=0.125 2023-11-28 16:53:22,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3589213.3333333335, ans=0.1 2023-11-28 16:53:37,170 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-28 16:53:37,447 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:53:42,841 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9350, loss[loss=0.05971, simple_loss=0.08518, pruned_loss=0.009346, audio_tagging_loss=0.007773, over 15769.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08991, pruned_loss=0.01201, audio_tagging_loss=0.008571, over 3056849.19 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:54:00,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3589413.3333333335, ans=0.2 2023-11-28 16:54:18,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2023-11-28 16:54:20,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3589546.6666666665, ans=0.125 2023-11-28 16:54:21,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3589546.6666666665, ans=0.125 2023-11-28 16:54:28,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3589546.6666666665, ans=0.125 2023-11-28 16:54:33,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-28 16:54:40,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-28 16:54:45,250 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9400, loss[loss=0.06382, simple_loss=0.08806, pruned_loss=0.01273, audio_tagging_loss=0.007063, over 15058.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08915, pruned_loss=0.01183, audio_tagging_loss=0.008705, over 3049785.97 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:54:55,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-28 16:55:11,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.693e+01 8.781e+01 9.524e+01 1.025e+02 1.257e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 16:55:26,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.07 vs. limit=10.0 2023-11-28 16:55:28,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-28 16:55:42,506 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-28 16:55:47,178 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9450, loss[loss=0.06736, simple_loss=0.08386, pruned_loss=0.01648, audio_tagging_loss=0.008946, over 14909.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08885, pruned_loss=0.01188, audio_tagging_loss=0.008665, over 3052836.72 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:55:47,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3590013.3333333335, ans=0.1 2023-11-28 16:55:47,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3590013.3333333335, ans=0.1 2023-11-28 16:55:49,529 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:55:52,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3590013.3333333335, ans=0.04949747468305833 2023-11-28 16:55:56,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3590013.3333333335, ans=0.125 2023-11-28 16:56:04,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3590080.0, ans=0.125 2023-11-28 16:56:15,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-28 16:56:27,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3590213.3333333335, ans=0.2 2023-11-28 16:56:45,136 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-28 16:56:49,787 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9500, loss[loss=0.06852, simple_loss=0.09622, pruned_loss=0.01054, audio_tagging_loss=0.009872, over 16699.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08956, pruned_loss=0.01204, audio_tagging_loss=0.008707, over 3054753.19 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:56:50,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3590346.6666666665, ans=0.1 2023-11-28 16:57:15,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-11-28 16:57:16,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 9.072e+01 9.708e+01 1.059e+02 2.012e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 16:57:47,648 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-28 16:57:52,595 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9550, loss[loss=0.06683, simple_loss=0.09226, pruned_loss=0.01197, audio_tagging_loss=0.008729, over 14985.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09095, pruned_loss=0.01218, audio_tagging_loss=0.008768, over 3052700.82 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:58:13,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-28 16:58:30,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3590880.0, ans=0.1 2023-11-28 16:58:35,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3590880.0, ans=0.0 2023-11-28 16:58:50,217 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-28 16:58:54,979 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9600, loss[loss=0.06399, simple_loss=0.08238, pruned_loss=0.01355, audio_tagging_loss=0.00924, over 15630.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09169, pruned_loss=0.01253, audio_tagging_loss=0.008848, over 3054121.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:59:05,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3591013.3333333335, ans=0.0 2023-11-28 16:59:21,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 9.047e+01 9.589e+01 1.034e+02 1.302e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:59:29,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3591146.6666666665, ans=0.0 2023-11-28 16:59:29,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3591146.6666666665, ans=0.0 2023-11-28 16:59:30,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3591146.6666666665, ans=0.025 2023-11-28 16:59:31,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3591213.3333333335, ans=0.1 2023-11-28 16:59:38,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3591213.3333333335, ans=0.0 2023-11-28 16:59:44,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3591280.0, ans=0.2 2023-11-28 16:59:53,241 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-28 16:59:53,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-28 16:59:57,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3591346.6666666665, ans=0.125 2023-11-28 16:59:57,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3591346.6666666665, ans=0.0 2023-11-28 16:59:57,916 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9650, loss[loss=0.06758, simple_loss=0.09023, pruned_loss=0.01595, audio_tagging_loss=0.006516, over 16256.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09152, pruned_loss=0.01249, audio_tagging_loss=0.008866, over 3056261.21 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:59:59,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3591346.6666666665, ans=0.2 2023-11-28 17:00:24,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3591480.0, ans=0.125 2023-11-28 17:00:24,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3591480.0, ans=0.07 2023-11-28 17:00:24,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2023-11-28 17:00:27,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3591480.0, ans=0.0 2023-11-28 17:00:54,598 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-28 17:00:57,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3591613.3333333335, ans=0.0 2023-11-28 17:00:59,820 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9700, loss[loss=0.04555, simple_loss=0.06603, pruned_loss=0.005825, audio_tagging_loss=0.006707, over 15652.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09197, pruned_loss=0.01257, audio_tagging_loss=0.008695, over 3060608.68 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:01:01,823 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:01:15,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-28 17:01:19,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3591746.6666666665, ans=0.125 2023-11-28 17:01:26,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 8.979e+01 9.434e+01 1.003e+02 1.570e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 17:01:30,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3591813.3333333335, ans=0.0 2023-11-28 17:01:30,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-28 17:01:31,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3591813.3333333335, ans=0.0 2023-11-28 17:01:34,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3591813.3333333335, ans=0.125 2023-11-28 17:01:39,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-28 17:01:39,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3591880.0, ans=0.125 2023-11-28 17:01:57,456 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-28 17:02:02,765 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9750, loss[loss=0.07329, simple_loss=0.1048, pruned_loss=0.01242, audio_tagging_loss=0.008461, over 14799.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09196, pruned_loss=0.01256, audio_tagging_loss=0.008521, over 3054652.64 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:02:07,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3592013.3333333335, ans=0.125 2023-11-28 17:02:11,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3592013.3333333335, ans=0.0 2023-11-28 17:02:41,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3592213.3333333335, ans=0.125 2023-11-28 17:02:42,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3592213.3333333335, ans=0.125 2023-11-28 17:02:53,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3592280.0, ans=0.0 2023-11-28 17:02:57,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3592280.0, ans=0.0 2023-11-28 17:02:59,591 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-28 17:03:04,977 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9800, loss[loss=0.0424, simple_loss=0.04943, pruned_loss=0.006957, audio_tagging_loss=0.01072, over 14984.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09073, pruned_loss=0.01237, audio_tagging_loss=0.008494, over 3051649.12 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:03:22,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3592413.3333333335, ans=0.125 2023-11-28 17:03:31,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.992e+01 9.640e+01 1.016e+02 2.169e+02, threshold=1.928e+02, percent-clipped=1.0 2023-11-28 17:04:02,227 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-28 17:04:02,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3592613.3333333335, ans=0.125 2023-11-28 17:04:04,651 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:04:07,525 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9850, loss[loss=0.07737, simple_loss=0.1032, pruned_loss=0.01791, audio_tagging_loss=0.007872, over 15357.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09012, pruned_loss=0.01233, audio_tagging_loss=0.008454, over 3051830.11 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:04:10,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.89 vs. limit=10.0 2023-11-28 17:04:28,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3592746.6666666665, ans=0.2 2023-11-28 17:04:44,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3592880.0, ans=0.2 2023-11-28 17:04:57,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=22.5 2023-11-28 17:05:00,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3592946.6666666665, ans=0.0 2023-11-28 17:05:00,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3592946.6666666665, ans=0.2 2023-11-28 17:05:02,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-28 17:05:05,047 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-28 17:05:08,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3592946.6666666665, ans=0.1 2023-11-28 17:05:10,983 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9900, loss[loss=0.0732, simple_loss=0.09879, pruned_loss=0.01564, audio_tagging_loss=0.008171, over 15235.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08986, pruned_loss=0.0123, audio_tagging_loss=0.008359, over 3048650.96 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:05:29,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3593080.0, ans=0.125 2023-11-28 17:05:36,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.160e+01 9.894e+01 1.082e+02 1.663e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 17:06:04,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3593280.0, ans=0.125 2023-11-28 17:06:08,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-28 17:06:14,240 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9950, loss[loss=0.06812, simple_loss=0.09255, pruned_loss=0.01334, audio_tagging_loss=0.008501, over 17122.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08979, pruned_loss=0.01222, audio_tagging_loss=0.008297, over 3046103.84 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:06:22,892 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:06:32,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3593413.3333333335, ans=0.05 2023-11-28 17:06:39,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3593480.0, ans=0.0 2023-11-28 17:06:42,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=22.5 2023-11-28 17:06:53,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3593546.6666666665, ans=0.125 2023-11-28 17:07:10,811 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-28 17:07:14,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3593680.0, ans=0.1 2023-11-28 17:07:15,456 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10000, loss[loss=0.04934, simple_loss=0.06822, pruned_loss=0.007727, audio_tagging_loss=0.007499, over 14683.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08931, pruned_loss=0.01207, audio_tagging_loss=0.00827, over 3043072.17 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:07:42,047 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.648e+01 9.149e+01 9.983e+01 1.212e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-28 17:07:56,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3593880.0, ans=0.125 2023-11-28 17:08:02,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3593880.0, ans=0.0 2023-11-28 17:08:07,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3593946.6666666665, ans=0.1 2023-11-28 17:08:10,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3593946.6666666665, ans=0.0 2023-11-28 17:08:13,323 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-28 17:08:18,064 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10050, loss[loss=0.08237, simple_loss=0.1086, pruned_loss=0.01899, audio_tagging_loss=0.009083, over 15007.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08925, pruned_loss=0.01199, audio_tagging_loss=0.008363, over 3047676.89 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:08:39,247 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:08:43,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-28 17:08:57,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-28 17:09:16,309 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-28 17:09:17,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3594280.0, ans=0.125 2023-11-28 17:09:21,024 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10100, loss[loss=0.04874, simple_loss=0.06102, pruned_loss=0.008959, audio_tagging_loss=0.009268, over 14845.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08905, pruned_loss=0.01197, audio_tagging_loss=0.008397, over 3053312.62 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:09:48,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.910e+01 9.610e+01 1.020e+02 1.223e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:10:04,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3594546.6666666665, ans=0.125 2023-11-28 17:10:10,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3594613.3333333335, ans=0.125 2023-11-28 17:10:16,216 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:10:17,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3594613.3333333335, ans=0.0 2023-11-28 17:10:18,777 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-28 17:10:23,880 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10150, loss[loss=0.05198, simple_loss=0.05491, pruned_loss=0.0115, audio_tagging_loss=0.01302, over 16127.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08894, pruned_loss=0.012, audio_tagging_loss=0.008522, over 3055122.09 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:10:56,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3594813.3333333335, ans=0.0 2023-11-28 17:10:57,593 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:11:01,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3594880.0, ans=0.09899494936611666 2023-11-28 17:11:04,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3594880.0, ans=0.125 2023-11-28 17:11:04,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3594880.0, ans=0.125 2023-11-28 17:11:21,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-28 17:11:26,361 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10200, loss[loss=0.06769, simple_loss=0.09249, pruned_loss=0.01245, audio_tagging_loss=0.008999, over 15420.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08812, pruned_loss=0.01185, audio_tagging_loss=0.008675, over 3049899.37 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:11:54,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.974e+01 9.604e+01 1.042e+02 1.393e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:11:54,390 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:12:09,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-28 17:12:14,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-28 17:12:15,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3595280.0, ans=0.125 2023-11-28 17:12:17,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3595280.0, ans=0.125 2023-11-28 17:12:17,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3595280.0, ans=0.0 2023-11-28 17:12:17,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3595280.0, ans=0.1 2023-11-28 17:12:24,025 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-28 17:12:28,731 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10250, loss[loss=0.05947, simple_loss=0.07261, pruned_loss=0.01264, audio_tagging_loss=0.01052, over 15534.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.0884, pruned_loss=0.01201, audio_tagging_loss=0.008777, over 3050893.08 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:12:33,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-28 17:12:34,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2023-11-28 17:12:36,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-11-28 17:13:00,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3595480.0, ans=0.125 2023-11-28 17:13:07,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3595546.6666666665, ans=0.2 2023-11-28 17:13:12,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3595546.6666666665, ans=0.125 2023-11-28 17:13:26,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3595613.3333333335, ans=0.1 2023-11-28 17:13:27,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-28 17:13:27,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.35 vs. limit=10.0 2023-11-28 17:13:31,864 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10300, loss[loss=0.0638, simple_loss=0.0844, pruned_loss=0.0108, audio_tagging_loss=0.0108, over 15087.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08945, pruned_loss=0.0121, audio_tagging_loss=0.008797, over 3049777.93 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:13:36,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-28 17:13:59,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 9.050e+01 9.766e+01 1.043e+02 1.224e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-28 17:14:02,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3595813.3333333335, ans=0.125 2023-11-28 17:14:09,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-28 17:14:21,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3595946.6666666665, ans=0.125 2023-11-28 17:14:25,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3595946.6666666665, ans=0.0 2023-11-28 17:14:29,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-28 17:14:34,239 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10350, loss[loss=0.08704, simple_loss=0.1279, pruned_loss=0.01527, audio_tagging_loss=0.007801, over 16190.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09009, pruned_loss=0.01212, audio_tagging_loss=0.008876, over 3048409.57 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:14:36,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3596013.3333333335, ans=0.0 2023-11-28 17:14:39,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-28 17:14:43,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3596013.3333333335, ans=0.125 2023-11-28 17:14:52,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3596080.0, ans=0.125 2023-11-28 17:14:59,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2023-11-28 17:15:00,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3596146.6666666665, ans=0.2 2023-11-28 17:15:01,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3596146.6666666665, ans=0.125 2023-11-28 17:15:02,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-28 17:15:30,072 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-28 17:15:34,657 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10400, loss[loss=0.07062, simple_loss=0.1011, pruned_loss=0.01091, audio_tagging_loss=0.009158, over 15742.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08948, pruned_loss=0.01197, audio_tagging_loss=0.009021, over 3046093.81 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:15:49,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-28 17:16:01,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.937e+01 9.027e+01 9.708e+01 1.021e+02 1.407e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 17:16:12,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3596546.6666666665, ans=0.0 2023-11-28 17:16:13,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2023-11-28 17:16:14,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3596546.6666666665, ans=0.0 2023-11-28 17:16:32,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-28 17:16:33,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3596613.3333333335, ans=0.1 2023-11-28 17:16:36,755 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10450, loss[loss=0.06819, simple_loss=0.09815, pruned_loss=0.01243, audio_tagging_loss=0.006691, over 15289.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09011, pruned_loss=0.01228, audio_tagging_loss=0.008902, over 3042927.83 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:16:44,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3596680.0, ans=0.125 2023-11-28 17:16:46,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3596680.0, ans=0.125 2023-11-28 17:16:46,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-28 17:16:58,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3596746.6666666665, ans=0.125 2023-11-28 17:17:33,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-28 17:17:38,921 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10500, loss[loss=0.0754, simple_loss=0.104, pruned_loss=0.01768, audio_tagging_loss=0.005746, over 16217.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08938, pruned_loss=0.01225, audio_tagging_loss=0.008765, over 3044585.76 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:17:40,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3597013.3333333335, ans=0.0 2023-11-28 17:17:47,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3597013.3333333335, ans=0.0 2023-11-28 17:17:50,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-28 17:18:06,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.931e+01 9.605e+01 1.033e+02 2.073e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 17:18:07,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3597146.6666666665, ans=0.125 2023-11-28 17:18:10,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3597146.6666666665, ans=0.0 2023-11-28 17:18:30,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3597280.0, ans=10.0 2023-11-28 17:18:31,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3597280.0, ans=0.1 2023-11-28 17:18:34,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3597280.0, ans=0.05 2023-11-28 17:18:35,971 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-28 17:18:40,863 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10550, loss[loss=0.06619, simple_loss=0.09038, pruned_loss=0.01213, audio_tagging_loss=0.008869, over 16173.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08904, pruned_loss=0.01223, audio_tagging_loss=0.008717, over 3043394.18 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:18:44,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3597346.6666666665, ans=0.0 2023-11-28 17:18:57,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3597413.3333333335, ans=0.125 2023-11-28 17:19:15,059 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:19:19,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-11-28 17:19:37,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-28 17:19:42,577 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10600, loss[loss=0.06385, simple_loss=0.08515, pruned_loss=0.01347, audio_tagging_loss=0.007807, over 15046.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08893, pruned_loss=0.01221, audio_tagging_loss=0.008619, over 3041174.37 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:19:48,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-28 17:20:10,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3597813.3333333335, ans=10.0 2023-11-28 17:20:11,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.139e+01 8.954e+01 9.589e+01 1.025e+02 1.251e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 17:20:27,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3597880.0, ans=0.0 2023-11-28 17:20:28,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3597880.0, ans=0.125 2023-11-28 17:20:40,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-28 17:20:45,326 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10650, loss[loss=0.07442, simple_loss=0.1122, pruned_loss=0.0114, audio_tagging_loss=0.006924, over 14775.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08911, pruned_loss=0.01221, audio_tagging_loss=0.008456, over 3044854.31 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:20:59,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3598080.0, ans=0.0 2023-11-28 17:21:26,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2023-11-28 17:21:38,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3598280.0, ans=0.0 2023-11-28 17:21:41,965 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-28 17:21:45,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3598280.0, ans=0.1 2023-11-28 17:21:45,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-28 17:21:47,281 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10700, loss[loss=0.05493, simple_loss=0.06721, pruned_loss=0.01224, audio_tagging_loss=0.009083, over 15687.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08934, pruned_loss=0.01232, audio_tagging_loss=0.008465, over 3045696.14 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:21:55,767 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:22:04,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2023-11-28 17:22:15,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.739e+01 9.237e+01 1.012e+02 1.304e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 17:22:24,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3598546.6666666665, ans=0.2 2023-11-28 17:22:24,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3598546.6666666665, ans=0.0 2023-11-28 17:22:25,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3598546.6666666665, ans=0.2 2023-11-28 17:22:27,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-28 17:22:41,617 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:22:43,824 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-28 17:22:49,071 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10750, loss[loss=0.06719, simple_loss=0.08747, pruned_loss=0.01343, audio_tagging_loss=0.01003, over 14690.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08985, pruned_loss=0.01227, audio_tagging_loss=0.008425, over 3050111.24 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:23:09,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3598746.6666666665, ans=0.0 2023-11-28 17:23:15,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-28 17:23:21,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3598813.3333333335, ans=0.5 2023-11-28 17:23:24,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3598813.3333333335, ans=0.1 2023-11-28 17:23:37,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=22.5 2023-11-28 17:23:37,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3598946.6666666665, ans=0.125 2023-11-28 17:23:40,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-28 17:23:45,923 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-28 17:23:51,154 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10800, loss[loss=0.03755, simple_loss=0.04032, pruned_loss=0.002964, audio_tagging_loss=0.01443, over 13919.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08906, pruned_loss=0.01215, audio_tagging_loss=0.008411, over 3042731.36 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:24:10,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3599080.0, ans=0.09899494936611666 2023-11-28 17:24:19,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.985e+01 9.429e+01 1.046e+02 1.643e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 17:24:27,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3599213.3333333335, ans=0.04949747468305833 2023-11-28 17:24:34,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3599213.3333333335, ans=10.0 2023-11-28 17:24:37,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-28 17:24:38,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-28 17:24:45,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3599280.0, ans=0.1 2023-11-28 17:24:46,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3599280.0, ans=0.2 2023-11-28 17:24:48,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-28 17:24:53,243 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10850, loss[loss=0.06758, simple_loss=0.08826, pruned_loss=0.01537, audio_tagging_loss=0.008072, over 14752.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08917, pruned_loss=0.01199, audio_tagging_loss=0.008414, over 3040766.19 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:25:10,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599413.3333333335, ans=0.1 2023-11-28 17:25:18,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599480.0, ans=0.1 2023-11-28 17:25:35,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3599546.6666666665, ans=0.125 2023-11-28 17:25:49,846 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-28 17:25:55,010 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10900, loss[loss=0.07399, simple_loss=0.1103, pruned_loss=0.01287, audio_tagging_loss=0.005968, over 15377.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09018, pruned_loss=0.01214, audio_tagging_loss=0.008444, over 3042847.13 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:25:55,064 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:25:58,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3599680.0, ans=0.0 2023-11-28 17:26:06,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3599746.6666666665, ans=0.125 2023-11-28 17:26:08,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3599746.6666666665, ans=0.95 2023-11-28 17:26:09,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3599746.6666666665, ans=0.05 2023-11-28 17:26:23,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.863e+01 9.525e+01 1.011e+02 1.256e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 17:26:51,458 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-28 17:26:58,859 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10950, loss[loss=0.08281, simple_loss=0.1172, pruned_loss=0.01742, audio_tagging_loss=0.006806, over 15186.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08981, pruned_loss=0.01217, audio_tagging_loss=0.00857, over 3047094.92 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:27:01,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3600013.3333333335, ans=0.125 2023-11-28 17:27:03,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3600013.3333333335, ans=0.05 2023-11-28 17:27:21,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3600080.0, ans=0.125 2023-11-28 17:27:36,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3600213.3333333335, ans=0.0 2023-11-28 17:27:41,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-28 17:27:44,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600213.3333333335, ans=0.1 2023-11-28 17:27:56,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-28 17:27:56,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3600280.0, ans=0.125 2023-11-28 17:28:01,176 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11000, loss[loss=0.06628, simple_loss=0.09253, pruned_loss=0.01346, audio_tagging_loss=0.006557, over 16196.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09083, pruned_loss=0.0124, audio_tagging_loss=0.008599, over 3053013.16 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:28:01,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3600346.6666666665, ans=0.0 2023-11-28 17:28:02,524 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:28:05,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3600346.6666666665, ans=0.125 2023-11-28 17:28:08,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3600346.6666666665, ans=0.5 2023-11-28 17:28:15,638 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:28:17,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3600413.3333333335, ans=0.0 2023-11-28 17:28:18,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3600413.3333333335, ans=0.1 2023-11-28 17:28:30,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.952e+01 9.649e+01 1.058e+02 1.351e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 17:28:45,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3600546.6666666665, ans=0.125 2023-11-28 17:28:58,202 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-28 17:29:02,659 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11050, loss[loss=0.0667, simple_loss=0.09727, pruned_loss=0.008323, audio_tagging_loss=0.009744, over 15834.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09053, pruned_loss=0.0122, audio_tagging_loss=0.008672, over 3056077.47 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:29:34,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3600813.3333333335, ans=0.0 2023-11-28 17:29:43,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3600880.0, ans=0.0 2023-11-28 17:29:57,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3600946.6666666665, ans=0.0 2023-11-28 17:29:59,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-28 17:30:04,325 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11100, loss[loss=0.07596, simple_loss=0.1025, pruned_loss=0.01532, audio_tagging_loss=0.009378, over 14846.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08951, pruned_loss=0.01205, audio_tagging_loss=0.008772, over 3050940.28 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:30:25,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3601080.0, ans=0.125 2023-11-28 17:30:34,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.960e+01 9.547e+01 1.044e+02 1.303e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 17:30:35,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3601146.6666666665, ans=0.2 2023-11-28 17:30:35,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3601146.6666666665, ans=0.125 2023-11-28 17:30:37,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2023-11-28 17:31:01,475 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-28 17:31:04,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3601280.0, ans=0.1 2023-11-28 17:31:06,499 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11150, loss[loss=0.07304, simple_loss=0.09346, pruned_loss=0.0168, audio_tagging_loss=0.009508, over 15981.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08993, pruned_loss=0.01223, audio_tagging_loss=0.008881, over 3056148.05 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:31:29,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3601413.3333333335, ans=0.2 2023-11-28 17:31:46,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3601546.6666666665, ans=0.125 2023-11-28 17:32:04,050 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-28 17:32:08,646 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11200, loss[loss=0.06738, simple_loss=0.08855, pruned_loss=0.01439, audio_tagging_loss=0.008714, over 16491.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08939, pruned_loss=0.01221, audio_tagging_loss=0.008965, over 3050554.41 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:32:19,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3601680.0, ans=0.2 2023-11-28 17:32:38,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.987e+01 9.522e+01 1.045e+02 1.448e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 17:32:41,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.36 vs. limit=22.5 2023-11-28 17:32:47,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-28 17:33:03,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3601946.6666666665, ans=0.0 2023-11-28 17:33:04,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3601946.6666666665, ans=0.0 2023-11-28 17:33:05,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-28 17:33:09,748 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11250, loss[loss=0.05526, simple_loss=0.07386, pruned_loss=0.009754, audio_tagging_loss=0.008575, over 14652.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08897, pruned_loss=0.01209, audio_tagging_loss=0.00899, over 3053542.67 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:33:20,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3602013.3333333335, ans=0.2 2023-11-28 17:33:27,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-28 17:33:39,343 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:33:50,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3602213.3333333335, ans=0.125 2023-11-28 17:33:51,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3602213.3333333335, ans=0.125 2023-11-28 17:33:58,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=15.0 2023-11-28 17:34:07,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-28 17:34:11,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=10.0 2023-11-28 17:34:11,697 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11300, loss[loss=0.1001, simple_loss=0.1371, pruned_loss=0.02576, audio_tagging_loss=0.005836, over 15889.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08968, pruned_loss=0.01226, audio_tagging_loss=0.008805, over 3046209.27 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:34:20,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3602346.6666666665, ans=0.2 2023-11-28 17:34:20,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3602346.6666666665, ans=0.0 2023-11-28 17:34:41,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.983e+01 9.542e+01 1.053e+02 1.409e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 17:34:41,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3602480.0, ans=0.125 2023-11-28 17:34:42,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3602480.0, ans=0.1 2023-11-28 17:34:43,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3602480.0, ans=0.125 2023-11-28 17:34:59,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3602613.3333333335, ans=0.0 2023-11-28 17:35:08,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-28 17:35:14,230 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11350, loss[loss=0.08548, simple_loss=0.1174, pruned_loss=0.01967, audio_tagging_loss=0.007114, over 14928.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08987, pruned_loss=0.01232, audio_tagging_loss=0.008745, over 3043876.89 frames. ], batch size: 52, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:35:15,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3602680.0, ans=0.05 2023-11-28 17:35:22,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3602680.0, ans=0.1 2023-11-28 17:35:24,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3602680.0, ans=0.02 2023-11-28 17:35:25,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3602746.6666666665, ans=0.2 2023-11-28 17:35:29,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3602746.6666666665, ans=0.5 2023-11-28 17:35:38,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3602813.3333333335, ans=0.07 2023-11-28 17:35:49,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3602880.0, ans=0.0 2023-11-28 17:36:06,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3602946.6666666665, ans=0.125 2023-11-28 17:36:08,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3602946.6666666665, ans=0.0 2023-11-28 17:36:11,249 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-28 17:36:12,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3602946.6666666665, ans=0.125 2023-11-28 17:36:15,773 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11400, loss[loss=0.08842, simple_loss=0.1242, pruned_loss=0.02058, audio_tagging_loss=0.005758, over 14957.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09082, pruned_loss=0.01256, audio_tagging_loss=0.008497, over 3043879.03 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:36:25,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3603013.3333333335, ans=0.07 2023-11-28 17:36:29,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3603080.0, ans=0.0 2023-11-28 17:36:33,161 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:36:33,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3603080.0, ans=0.0 2023-11-28 17:36:35,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3603080.0, ans=15.0 2023-11-28 17:36:36,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3603080.0, ans=0.0 2023-11-28 17:36:47,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.040e+01 9.661e+01 1.043e+02 1.391e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 17:37:08,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-11-28 17:37:12,690 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-28 17:37:18,026 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11450, loss[loss=0.06424, simple_loss=0.09029, pruned_loss=0.009793, audio_tagging_loss=0.009305, over 14244.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.091, pruned_loss=0.01254, audio_tagging_loss=0.008472, over 3046919.71 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:38:00,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2023-11-28 17:38:02,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2023-11-28 17:38:15,201 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-28 17:38:19,845 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11500, loss[loss=0.03797, simple_loss=0.04921, pruned_loss=0.005327, audio_tagging_loss=0.008041, over 16100.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0902, pruned_loss=0.01233, audio_tagging_loss=0.008417, over 3049235.02 frames. ], batch size: 64, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:38:21,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3603680.0, ans=0.125 2023-11-28 17:38:50,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.641e+01 9.267e+01 1.033e+02 1.350e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 17:38:59,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2023-11-28 17:39:12,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-11-28 17:39:17,488 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-28 17:39:22,434 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11550, loss[loss=0.09474, simple_loss=0.136, pruned_loss=0.0196, audio_tagging_loss=0.007124, over 14888.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09075, pruned_loss=0.01246, audio_tagging_loss=0.008417, over 3050583.49 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:40:05,114 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:40:18,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-28 17:40:23,063 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11600, loss[loss=0.07107, simple_loss=0.09291, pruned_loss=0.01323, audio_tagging_loss=0.01139, over 14326.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09101, pruned_loss=0.01251, audio_tagging_loss=0.008447, over 3050553.68 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:40:55,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.847e+01 9.637e+01 1.017e+02 1.289e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 17:40:58,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3604480.0, ans=0.0 2023-11-28 17:41:00,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3604546.6666666665, ans=0.125 2023-11-28 17:41:02,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-28 17:41:06,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-11-28 17:41:07,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3604546.6666666665, ans=0.125 2023-11-28 17:41:20,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2023-11-28 17:41:21,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-28 17:41:26,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2023-11-28 17:41:26,701 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11650, loss[loss=0.06068, simple_loss=0.09023, pruned_loss=0.009824, audio_tagging_loss=0.005736, over 15109.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09027, pruned_loss=0.01237, audio_tagging_loss=0.008555, over 3051506.54 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:41:39,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3604746.6666666665, ans=0.125 2023-11-28 17:41:46,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3604746.6666666665, ans=0.125 2023-11-28 17:41:59,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3604813.3333333335, ans=0.0 2023-11-28 17:42:10,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3604880.0, ans=0.125 2023-11-28 17:42:10,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3604880.0, ans=0.125 2023-11-28 17:42:19,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-11-28 17:42:22,890 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-28 17:42:28,511 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11700, loss[loss=0.04685, simple_loss=0.06138, pruned_loss=0.007425, audio_tagging_loss=0.008737, over 16664.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08915, pruned_loss=0.01214, audio_tagging_loss=0.008672, over 3053286.57 frames. ], batch size: 64, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:42:37,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3605013.3333333335, ans=0.125 2023-11-28 17:42:37,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3605013.3333333335, ans=0.0 2023-11-28 17:42:54,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-28 17:42:58,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 9.057e+01 9.679e+01 1.035e+02 1.386e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 17:43:01,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3605146.6666666665, ans=0.125 2023-11-28 17:43:16,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3605280.0, ans=0.125 2023-11-28 17:43:24,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-28 17:43:29,698 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11750, loss[loss=0.06095, simple_loss=0.07701, pruned_loss=0.01189, audio_tagging_loss=0.01055, over 14583.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08862, pruned_loss=0.01212, audio_tagging_loss=0.00876, over 3053474.85 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:43:29,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3605346.6666666665, ans=0.125 2023-11-28 17:43:49,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3605413.3333333335, ans=0.125 2023-11-28 17:43:51,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3605413.3333333335, ans=0.2 2023-11-28 17:43:51,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3605413.3333333335, ans=0.2 2023-11-28 17:44:25,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3605613.3333333335, ans=0.125 2023-11-28 17:44:25,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3605613.3333333335, ans=0.125 2023-11-28 17:44:27,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-28 17:44:29,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3605613.3333333335, ans=0.125 2023-11-28 17:44:32,159 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11800, loss[loss=0.07421, simple_loss=0.1043, pruned_loss=0.01512, audio_tagging_loss=0.006954, over 16296.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08901, pruned_loss=0.01206, audio_tagging_loss=0.008678, over 3050844.10 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:44:38,299 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:44:50,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-11-28 17:45:02,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.699e+01 9.349e+01 9.967e+01 1.294e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 17:45:10,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3605880.0, ans=0.0 2023-11-28 17:45:10,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605880.0, ans=0.1 2023-11-28 17:45:25,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3605946.6666666665, ans=0.0 2023-11-28 17:45:28,729 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-28 17:45:33,904 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11850, loss[loss=0.05838, simple_loss=0.07243, pruned_loss=0.01121, audio_tagging_loss=0.01095, over 14939.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08852, pruned_loss=0.01188, audio_tagging_loss=0.008799, over 3043642.54 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:45:43,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3606013.3333333335, ans=15.0 2023-11-28 17:45:57,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3606146.6666666665, ans=0.05 2023-11-28 17:46:01,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3606146.6666666665, ans=0.0 2023-11-28 17:46:10,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3606213.3333333335, ans=0.2 2023-11-28 17:46:21,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-28 17:46:25,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3606280.0, ans=0.125 2023-11-28 17:46:27,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-28 17:46:30,820 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-28 17:46:34,548 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:46:35,526 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11900, loss[loss=0.07662, simple_loss=0.1022, pruned_loss=0.01525, audio_tagging_loss=0.01029, over 15390.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08877, pruned_loss=0.01196, audio_tagging_loss=0.008933, over 3042701.25 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:46:57,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3606413.3333333335, ans=0.125 2023-11-28 17:47:05,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3606480.0, ans=12.0 2023-11-28 17:47:06,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.830e+01 8.976e+01 9.494e+01 1.024e+02 1.214e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 17:47:33,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-28 17:47:33,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3606613.3333333335, ans=0.2 2023-11-28 17:47:38,212 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11950, loss[loss=0.04743, simple_loss=0.06189, pruned_loss=0.007028, audio_tagging_loss=0.009463, over 13744.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08952, pruned_loss=0.01199, audio_tagging_loss=0.008891, over 3044873.40 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:47:44,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2023-11-28 17:47:56,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3606746.6666666665, ans=0.125 2023-11-28 17:48:15,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3606880.0, ans=0.0 2023-11-28 17:48:30,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606946.6666666665, ans=0.1 2023-11-28 17:48:33,754 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-28 17:48:38,215 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 12000, loss[loss=0.06384, simple_loss=0.09293, pruned_loss=0.01024, audio_tagging_loss=0.007141, over 15391.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08949, pruned_loss=0.01199, audio_tagging_loss=0.008916, over 3039413.75 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:48:38,216 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 17:48:53,580 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6211, 3.0702, 3.2274, 2.7687, 3.5141, 3.4668, 3.5669, 3.4202], device='cuda:3') 2023-11-28 17:49:16,761 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05759, simple_loss=0.05051, pruned_loss=0.005251, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-28 17:49:16,762 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 17:49:33,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-28 17:50:04,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-28 17:50:04,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-28 17:50:05,863 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 0, loss[loss=0.0821, simple_loss=0.09244, pruned_loss=0.01445, audio_tagging_loss=0.02143, over 15318.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.09244, pruned_loss=0.01445, audio_tagging_loss=0.02143, over 15318.00 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 32.0 2023-11-28 17:50:05,864 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 17:50:19,455 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3255, 4.7964, 5.1839, 4.4956], device='cuda:3') 2023-11-28 17:50:20,483 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0432, 3.6927, 3.8333, 3.5057, 4.2721, 4.2455, 4.4044, 4.2357], device='cuda:3') 2023-11-28 17:50:41,976 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05787, simple_loss=0.05054, pruned_loss=0.005286, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 17:50:41,977 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 17:50:43,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.886e+01 9.608e+01 1.034e+02 1.479e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:50:43,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-28 17:50:54,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3607253.3333333335, ans=0.125 2023-11-28 17:51:06,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-28 17:51:06,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3607320.0, ans=0.0 2023-11-28 17:51:19,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2023-11-28 17:51:43,539 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 50, loss[loss=0.08655, simple_loss=0.114, pruned_loss=0.01367, audio_tagging_loss=0.01586, over 15747.00 frames. ], tot_loss[loss=0.07486, simple_loss=0.09077, pruned_loss=0.01253, audio_tagging_loss=0.01694, over 695558.59 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:52:07,733 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-28 17:52:09,704 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:52:22,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3607720.0, ans=0.125 2023-11-28 17:52:44,946 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 100, loss[loss=0.06682, simple_loss=0.08739, pruned_loss=0.01192, audio_tagging_loss=0.01121, over 15340.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.0899, pruned_loss=0.01226, audio_tagging_loss=0.01614, over 1214773.00 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:52:47,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.554e+01 1.000e+02 1.063e+02 1.121e+02 1.597e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-28 17:52:47,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-28 17:53:02,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3607920.0, ans=0.04949747468305833 2023-11-28 17:53:10,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-28 17:53:17,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3607986.6666666665, ans=0.0 2023-11-28 17:53:41,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608120.0, ans=0.125 2023-11-28 17:53:43,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3608120.0, ans=0.2 2023-11-28 17:53:47,624 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 150, loss[loss=0.06426, simple_loss=0.09428, pruned_loss=0.009737, audio_tagging_loss=0.007387, over 15281.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.08881, pruned_loss=0.01196, audio_tagging_loss=0.01449, over 1622020.42 frames. ], batch size: 56, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:53:53,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-28 17:54:00,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3608253.3333333335, ans=0.0 2023-11-28 17:54:11,751 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-28 17:54:18,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-28 17:54:25,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-28 17:54:32,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-28 17:54:36,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608453.3333333335, ans=0.1 2023-11-28 17:54:36,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2023-11-28 17:54:49,306 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 200, loss[loss=0.0849, simple_loss=0.1092, pruned_loss=0.02177, audio_tagging_loss=0.008505, over 15299.00 frames. ], tot_loss[loss=0.07011, simple_loss=0.09004, pruned_loss=0.01223, audio_tagging_loss=0.01285, over 1942807.47 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:54:50,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3608520.0, ans=0.0 2023-11-28 17:54:51,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 9.120e+01 9.843e+01 1.065e+02 1.310e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-28 17:55:12,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608653.3333333335, ans=0.1 2023-11-28 17:55:13,188 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-28 17:55:20,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-28 17:55:30,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:37,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3608786.6666666665, ans=0.125 2023-11-28 17:55:50,899 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 250, loss[loss=0.06294, simple_loss=0.08534, pruned_loss=0.01136, audio_tagging_loss=0.008908, over 15215.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09, pruned_loss=0.01213, audio_tagging_loss=0.01162, over 2185252.74 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:05,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3608920.0, ans=0.125 2023-11-28 17:56:13,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3608920.0, ans=0.125 2023-11-28 17:56:16,215 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-28 17:56:26,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3608986.6666666665, ans=0.0 2023-11-28 17:56:33,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3609053.3333333335, ans=0.0 2023-11-28 17:56:45,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3609120.0, ans=0.0 2023-11-28 17:56:50,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3609120.0, ans=0.125 2023-11-28 17:56:53,094 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 300, loss[loss=0.064, simple_loss=0.0835, pruned_loss=0.01354, audio_tagging_loss=0.008707, over 15026.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.0901, pruned_loss=0.01209, audio_tagging_loss=0.01076, over 2375079.94 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:55,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.170e+01 9.069e+01 9.733e+01 1.020e+02 1.805e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 17:57:03,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3609186.6666666665, ans=0.2 2023-11-28 17:57:03,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3609186.6666666665, ans=0.0 2023-11-28 17:57:17,682 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-28 17:57:23,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3609320.0, ans=0.125 2023-11-28 17:57:30,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3609386.6666666665, ans=0.2 2023-11-28 17:57:33,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609386.6666666665, ans=0.1 2023-11-28 17:57:36,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3609386.6666666665, ans=0.2 2023-11-28 17:57:41,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3609453.3333333335, ans=0.0 2023-11-28 17:57:55,487 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 350, loss[loss=0.06482, simple_loss=0.0944, pruned_loss=0.009723, audio_tagging_loss=0.007897, over 15090.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.08986, pruned_loss=0.01217, audio_tagging_loss=0.01012, over 2531019.09 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:58:05,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3609520.0, ans=0.125 2023-11-28 17:58:09,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3609586.6666666665, ans=0.125 2023-11-28 17:58:09,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609586.6666666665, ans=0.1 2023-11-28 17:58:16,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3609586.6666666665, ans=0.0 2023-11-28 17:58:19,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-28 17:58:21,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3609653.3333333335, ans=0.0 2023-11-28 17:58:41,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3609720.0, ans=0.05 2023-11-28 17:58:57,261 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 400, loss[loss=0.0705, simple_loss=0.1021, pruned_loss=0.01406, audio_tagging_loss=0.005404, over 15147.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08939, pruned_loss=0.0121, audio_tagging_loss=0.009817, over 2641615.21 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 17:58:59,620 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.699e+01 9.057e+01 9.604e+01 1.022e+02 1.428e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:59:08,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3609920.0, ans=0.0 2023-11-28 17:59:10,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3609920.0, ans=0.04949747468305833 2023-11-28 17:59:21,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-28 17:59:58,041 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 450, loss[loss=0.05063, simple_loss=0.06388, pruned_loss=0.005727, audio_tagging_loss=0.01296, over 16080.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.08955, pruned_loss=0.01234, audio_tagging_loss=0.009484, over 2731884.56 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:00:20,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=10.0 2023-11-28 18:00:23,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-28 18:01:00,940 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 500, loss[loss=0.06842, simple_loss=0.08812, pruned_loss=0.01592, audio_tagging_loss=0.008449, over 15144.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08924, pruned_loss=0.01212, audio_tagging_loss=0.009314, over 2800153.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:01:04,997 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.722e+01 9.408e+01 1.020e+02 1.286e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 18:01:25,484 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-28 18:01:34,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3610653.3333333335, ans=0.125 2023-11-28 18:01:58,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3610786.6666666665, ans=0.125 2023-11-28 18:02:01,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3610853.3333333335, ans=0.125 2023-11-28 18:02:02,611 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 550, loss[loss=0.05369, simple_loss=0.0787, pruned_loss=0.008001, audio_tagging_loss=0.006338, over 16415.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08868, pruned_loss=0.01205, audio_tagging_loss=0.009145, over 2856865.53 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:02:04,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3610853.3333333335, ans=0.0 2023-11-28 18:02:17,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-28 18:02:19,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3610920.0, ans=0.125 2023-11-28 18:02:27,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-28 18:02:38,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3611053.3333333335, ans=0.0 2023-11-28 18:02:42,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3611053.3333333335, ans=0.0 2023-11-28 18:02:45,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3611053.3333333335, ans=0.0 2023-11-28 18:02:49,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-28 18:03:04,245 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 600, loss[loss=0.06051, simple_loss=0.07793, pruned_loss=0.009514, audio_tagging_loss=0.01203, over 16539.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08957, pruned_loss=0.01204, audio_tagging_loss=0.009091, over 2901921.97 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:03:07,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.115e+01 9.737e+01 1.046e+02 1.247e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 18:03:07,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3611186.6666666665, ans=0.125 2023-11-28 18:03:20,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3611253.3333333335, ans=0.1 2023-11-28 18:03:23,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3611253.3333333335, ans=0.2 2023-11-28 18:03:23,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3611253.3333333335, ans=0.2 2023-11-28 18:03:23,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-28 18:03:24,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3611253.3333333335, ans=0.0 2023-11-28 18:03:29,338 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-28 18:03:47,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3611386.6666666665, ans=0.1 2023-11-28 18:03:50,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3611386.6666666665, ans=0.0 2023-11-28 18:03:50,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=12.0 2023-11-28 18:03:56,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611453.3333333335, ans=0.1 2023-11-28 18:04:01,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=22.5 2023-11-28 18:04:05,981 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 650, loss[loss=0.06279, simple_loss=0.08304, pruned_loss=0.01338, audio_tagging_loss=0.007885, over 15934.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08947, pruned_loss=0.01205, audio_tagging_loss=0.009055, over 2936500.02 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:04:21,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3611586.6666666665, ans=0.2 2023-11-28 18:04:31,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-28 18:04:36,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2023-11-28 18:04:38,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3611653.3333333335, ans=0.125 2023-11-28 18:04:45,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3611720.0, ans=0.2 2023-11-28 18:04:53,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-28 18:04:56,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-28 18:05:08,077 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 700, loss[loss=0.05853, simple_loss=0.07339, pruned_loss=0.01155, audio_tagging_loss=0.01029, over 14891.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08963, pruned_loss=0.01204, audio_tagging_loss=0.008936, over 2955273.39 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:05:12,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.893e+01 9.585e+01 1.037e+02 1.398e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 18:05:13,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3611853.3333333335, ans=0.125 2023-11-28 18:05:30,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3611920.0, ans=0.1 2023-11-28 18:05:33,548 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-28 18:05:45,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3612053.3333333335, ans=0.125 2023-11-28 18:05:53,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3612053.3333333335, ans=0.125 2023-11-28 18:06:09,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3612120.0, ans=0.125 2023-11-28 18:06:11,809 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 750, loss[loss=0.06919, simple_loss=0.09301, pruned_loss=0.01257, audio_tagging_loss=0.01011, over 14337.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08944, pruned_loss=0.01209, audio_tagging_loss=0.008937, over 2973583.43 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:06:25,715 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:06:33,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3612253.3333333335, ans=0.2 2023-11-28 18:06:36,841 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-28 18:06:40,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3612320.0, ans=0.125 2023-11-28 18:07:09,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612453.3333333335, ans=0.1 2023-11-28 18:07:14,135 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 800, loss[loss=0.09058, simple_loss=0.1318, pruned_loss=0.01769, audio_tagging_loss=0.007011, over 15185.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08963, pruned_loss=0.01209, audio_tagging_loss=0.008955, over 2990459.58 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:07:15,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-11-28 18:07:17,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.017e+01 9.748e+01 1.044e+02 1.462e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 18:07:20,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3612520.0, ans=0.025 2023-11-28 18:07:23,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-28 18:07:30,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3612586.6666666665, ans=0.0 2023-11-28 18:07:39,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-28 18:07:44,654 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:07:54,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612720.0, ans=0.1 2023-11-28 18:07:58,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:08:03,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3612786.6666666665, ans=0.0 2023-11-28 18:08:05,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3612786.6666666665, ans=0.0 2023-11-28 18:08:07,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-28 18:08:12,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-28 18:08:13,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-28 18:08:16,481 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 850, loss[loss=0.04781, simple_loss=0.05622, pruned_loss=0.008087, audio_tagging_loss=0.01161, over 14704.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08932, pruned_loss=0.01197, audio_tagging_loss=0.009008, over 2998858.19 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:08:41,265 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-28 18:09:01,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3613053.3333333335, ans=0.125 2023-11-28 18:09:15,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2023-11-28 18:09:18,549 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 900, loss[loss=0.08269, simple_loss=0.1175, pruned_loss=0.0168, audio_tagging_loss=0.007159, over 16501.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08949, pruned_loss=0.012, audio_tagging_loss=0.009119, over 3012448.81 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:09:24,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.864e+01 9.446e+01 1.016e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:09:24,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3613186.6666666665, ans=0.125 2023-11-28 18:09:32,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3613253.3333333335, ans=0.125 2023-11-28 18:09:43,176 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-28 18:09:43,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3613320.0, ans=0.2 2023-11-28 18:09:46,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=22.5 2023-11-28 18:10:00,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3613386.6666666665, ans=0.125 2023-11-28 18:10:20,687 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 950, loss[loss=0.05533, simple_loss=0.07733, pruned_loss=0.009167, audio_tagging_loss=0.007494, over 14487.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08982, pruned_loss=0.01197, audio_tagging_loss=0.008887, over 3017273.22 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:10:45,406 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-28 18:10:47,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.28 vs. limit=15.0 2023-11-28 18:11:07,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3613720.0, ans=0.125 2023-11-28 18:11:09,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-28 18:11:21,869 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1000, loss[loss=0.05982, simple_loss=0.08154, pruned_loss=0.01015, audio_tagging_loss=0.008897, over 14632.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09013, pruned_loss=0.01193, audio_tagging_loss=0.008747, over 3021609.02 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:11:23,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3613853.3333333335, ans=0.2 2023-11-28 18:11:27,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 8.919e+01 9.596e+01 1.036e+02 1.232e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 18:11:27,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-28 18:11:37,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3613920.0, ans=0.125 2023-11-28 18:11:46,608 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-28 18:11:50,858 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:11:59,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3614053.3333333335, ans=0.125 2023-11-28 18:12:01,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3614053.3333333335, ans=0.0 2023-11-28 18:12:24,608 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1050, loss[loss=0.07483, simple_loss=0.1023, pruned_loss=0.0148, audio_tagging_loss=0.008898, over 16311.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09005, pruned_loss=0.01196, audio_tagging_loss=0.008712, over 3028706.56 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:12:49,404 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-28 18:13:02,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3614386.6666666665, ans=0.125 2023-11-28 18:13:12,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3614386.6666666665, ans=0.07 2023-11-28 18:13:20,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3614453.3333333335, ans=0.125 2023-11-28 18:13:22,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3614453.3333333335, ans=0.1 2023-11-28 18:13:26,577 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1100, loss[loss=0.04962, simple_loss=0.07488, pruned_loss=0.004994, audio_tagging_loss=0.007188, over 14256.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08927, pruned_loss=0.01198, audio_tagging_loss=0.008703, over 3033863.26 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:13:30,461 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:13:31,284 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:13:32,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 8.948e+01 9.564e+01 1.065e+02 1.707e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 18:13:40,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3614586.6666666665, ans=0.125 2023-11-28 18:13:50,820 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-28 18:13:52,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-11-28 18:13:54,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3614653.3333333335, ans=0.125 2023-11-28 18:13:55,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3614653.3333333335, ans=0.125 2023-11-28 18:13:57,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3614653.3333333335, ans=0.0 2023-11-28 18:14:10,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3614720.0, ans=0.0 2023-11-28 18:14:28,983 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1150, loss[loss=0.06389, simple_loss=0.08971, pruned_loss=0.01218, audio_tagging_loss=0.006857, over 14018.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08917, pruned_loss=0.01196, audio_tagging_loss=0.008646, over 3028009.00 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:14:32,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-28 18:14:36,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3614853.3333333335, ans=0.0 2023-11-28 18:14:53,884 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-28 18:15:07,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3615053.3333333335, ans=0.0 2023-11-28 18:15:17,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3615120.0, ans=0.0 2023-11-28 18:15:21,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3615120.0, ans=0.125 2023-11-28 18:15:31,077 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1200, loss[loss=0.05808, simple_loss=0.07913, pruned_loss=0.01039, audio_tagging_loss=0.008129, over 14305.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08897, pruned_loss=0.01195, audio_tagging_loss=0.008561, over 3027401.71 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:15:35,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3615186.6666666665, ans=0.0 2023-11-28 18:15:36,981 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.860e+01 9.476e+01 1.010e+02 2.147e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-28 18:15:43,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3615253.3333333335, ans=0.125 2023-11-28 18:15:55,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-28 18:16:30,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3615453.3333333335, ans=0.125 2023-11-28 18:16:33,392 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1250, loss[loss=0.07206, simple_loss=0.1033, pruned_loss=0.01334, audio_tagging_loss=0.007048, over 14839.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08928, pruned_loss=0.01202, audio_tagging_loss=0.008515, over 3031037.05 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:16:44,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-11-28 18:16:48,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3615586.6666666665, ans=0.2 2023-11-28 18:16:51,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3615586.6666666665, ans=0.125 2023-11-28 18:16:57,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-28 18:17:13,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615720.0, ans=0.1 2023-11-28 18:17:21,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3615786.6666666665, ans=0.2 2023-11-28 18:17:28,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3615786.6666666665, ans=0.125 2023-11-28 18:17:35,299 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1300, loss[loss=0.05165, simple_loss=0.06942, pruned_loss=0.01045, audio_tagging_loss=0.006497, over 13996.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08907, pruned_loss=0.01202, audio_tagging_loss=0.008456, over 3033115.45 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:17:41,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.784e+01 9.305e+01 1.002e+02 1.226e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 18:17:41,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-11-28 18:17:59,130 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-28 18:18:11,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3616053.3333333335, ans=0.125 2023-11-28 18:18:28,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3616120.0, ans=0.125 2023-11-28 18:18:37,340 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1350, loss[loss=0.05581, simple_loss=0.06375, pruned_loss=0.01245, audio_tagging_loss=0.01148, over 17218.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08863, pruned_loss=0.01205, audio_tagging_loss=0.008466, over 3031781.60 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:18:50,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3616253.3333333335, ans=0.0 2023-11-28 18:19:02,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-28 18:19:23,263 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:19:37,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-28 18:19:38,412 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1400, loss[loss=0.05121, simple_loss=0.06921, pruned_loss=0.008412, audio_tagging_loss=0.008195, over 14735.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08858, pruned_loss=0.01199, audio_tagging_loss=0.008555, over 3032862.78 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:19:45,140 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.002e+01 9.471e+01 1.001e+02 1.235e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 18:19:48,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3616520.0, ans=0.0 2023-11-28 18:19:50,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3616586.6666666665, ans=0.125 2023-11-28 18:20:03,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-28 18:20:40,611 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1450, loss[loss=0.06187, simple_loss=0.09189, pruned_loss=0.008869, audio_tagging_loss=0.007057, over 15766.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08998, pruned_loss=0.01212, audio_tagging_loss=0.008605, over 3041572.09 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:20:48,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3616853.3333333335, ans=0.05 2023-11-28 18:20:49,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-28 18:20:50,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3616853.3333333335, ans=0.04949747468305833 2023-11-28 18:20:53,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3616920.0, ans=0.125 2023-11-28 18:20:54,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3616920.0, ans=0.0 2023-11-28 18:21:03,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3616920.0, ans=0.125 2023-11-28 18:21:05,399 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-28 18:21:17,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-11-28 18:21:42,930 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1500, loss[loss=0.08833, simple_loss=0.1181, pruned_loss=0.02148, audio_tagging_loss=0.007819, over 14071.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09028, pruned_loss=0.0122, audio_tagging_loss=0.008715, over 3042186.77 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:21:50,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.243e+01 1.008e+02 1.066e+02 1.395e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-28 18:21:50,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3617186.6666666665, ans=0.0 2023-11-28 18:22:07,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-28 18:22:33,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2023-11-28 18:22:45,108 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1550, loss[loss=0.07585, simple_loss=0.09899, pruned_loss=0.01644, audio_tagging_loss=0.009913, over 16007.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08964, pruned_loss=0.012, audio_tagging_loss=0.0087, over 3041244.07 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:23:09,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3617653.3333333335, ans=0.1 2023-11-28 18:23:10,241 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-28 18:23:22,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3617720.0, ans=0.125 2023-11-28 18:23:46,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3617853.3333333335, ans=0.125 2023-11-28 18:23:47,208 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1600, loss[loss=0.06384, simple_loss=0.08557, pruned_loss=0.01102, audio_tagging_loss=0.01005, over 15067.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08951, pruned_loss=0.01207, audio_tagging_loss=0.008789, over 3049121.45 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:23:54,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.133e+01 9.762e+01 1.043e+02 1.262e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 18:24:11,808 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-28 18:24:20,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2023-11-28 18:24:37,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3618120.0, ans=0.125 2023-11-28 18:24:47,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3618186.6666666665, ans=0.2 2023-11-28 18:24:48,494 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1650, loss[loss=0.07581, simple_loss=0.1199, pruned_loss=0.01118, audio_tagging_loss=0.004675, over 15156.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08952, pruned_loss=0.01205, audio_tagging_loss=0.008816, over 3046038.33 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:24:50,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3618186.6666666665, ans=0.2 2023-11-28 18:24:58,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3618186.6666666665, ans=0.125 2023-11-28 18:25:02,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3618253.3333333335, ans=0.125 2023-11-28 18:25:11,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2023-11-28 18:25:13,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-28 18:25:15,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618320.0, ans=0.1 2023-11-28 18:25:27,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3618386.6666666665, ans=0.0 2023-11-28 18:25:29,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3618386.6666666665, ans=0.0 2023-11-28 18:25:30,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-28 18:25:33,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3618386.6666666665, ans=0.0 2023-11-28 18:25:40,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3618453.3333333335, ans=0.2 2023-11-28 18:25:43,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3618453.3333333335, ans=0.0 2023-11-28 18:25:49,852 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1700, loss[loss=0.08664, simple_loss=0.1307, pruned_loss=0.01611, audio_tagging_loss=0.005167, over 16340.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08896, pruned_loss=0.01197, audio_tagging_loss=0.008759, over 3042487.76 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:25:51,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-28 18:25:57,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.880e+01 9.352e+01 1.002e+02 1.354e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 18:26:15,583 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-28 18:26:20,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3618653.3333333335, ans=0.0 2023-11-28 18:26:52,314 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1750, loss[loss=0.04746, simple_loss=0.05514, pruned_loss=0.006885, audio_tagging_loss=0.013, over 16525.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08872, pruned_loss=0.01197, audio_tagging_loss=0.008713, over 3048806.16 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:26:57,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-28 18:27:09,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3618920.0, ans=0.1 2023-11-28 18:27:17,727 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-28 18:27:21,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3618986.6666666665, ans=0.0 2023-11-28 18:27:43,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3619120.0, ans=0.2 2023-11-28 18:27:54,813 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1800, loss[loss=0.04828, simple_loss=0.06694, pruned_loss=0.006326, audio_tagging_loss=0.008483, over 14102.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08918, pruned_loss=0.01197, audio_tagging_loss=0.008653, over 3049001.34 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:28:02,577 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.817e+01 9.553e+01 1.013e+02 1.527e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 18:28:02,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3619186.6666666665, ans=0.125 2023-11-28 18:28:16,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3619253.3333333335, ans=0.0 2023-11-28 18:28:19,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-28 18:28:22,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3619320.0, ans=0.125 2023-11-28 18:28:23,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-28 18:28:24,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3619320.0, ans=0.2 2023-11-28 18:28:41,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3619386.6666666665, ans=0.125 2023-11-28 18:28:43,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3619453.3333333335, ans=0.125 2023-11-28 18:28:56,456 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1850, loss[loss=0.05914, simple_loss=0.0786, pruned_loss=0.00874, audio_tagging_loss=0.0111, over 16922.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0889, pruned_loss=0.012, audio_tagging_loss=0.008629, over 3046944.59 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:28:59,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2023-11-28 18:29:17,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3619586.6666666665, ans=0.0 2023-11-28 18:29:20,999 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-28 18:29:40,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3619720.0, ans=0.125 2023-11-28 18:29:41,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3619720.0, ans=0.125 2023-11-28 18:29:45,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-28 18:29:58,051 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1900, loss[loss=0.05644, simple_loss=0.07114, pruned_loss=0.01155, audio_tagging_loss=0.00932, over 14575.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08908, pruned_loss=0.01197, audio_tagging_loss=0.008548, over 3047672.01 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:30:02,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-28 18:30:06,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.846e+01 9.695e+01 1.030e+02 1.290e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:30:15,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3619920.0, ans=0.0 2023-11-28 18:30:20,735 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:30:24,850 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-28 18:30:59,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3620120.0, ans=0.0 2023-11-28 18:31:01,997 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1950, loss[loss=0.06387, simple_loss=0.08538, pruned_loss=0.01239, audio_tagging_loss=0.008795, over 14893.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08756, pruned_loss=0.01167, audio_tagging_loss=0.008541, over 3047458.84 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:31:06,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3620186.6666666665, ans=0.125 2023-11-28 18:31:22,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3620253.3333333335, ans=0.0 2023-11-28 18:31:22,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3620253.3333333335, ans=0.125 2023-11-28 18:31:27,042 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-28 18:31:52,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3620453.3333333335, ans=0.0 2023-11-28 18:31:54,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3620453.3333333335, ans=0.125 2023-11-28 18:32:00,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3620453.3333333335, ans=0.125 2023-11-28 18:32:05,232 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2000, loss[loss=0.05055, simple_loss=0.06383, pruned_loss=0.007768, audio_tagging_loss=0.01087, over 14342.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.0873, pruned_loss=0.01166, audio_tagging_loss=0.008644, over 3053140.38 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:32:12,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.843e+01 9.517e+01 1.017e+02 1.675e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 18:32:14,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=12.0 2023-11-28 18:32:23,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3620586.6666666665, ans=0.125 2023-11-28 18:32:26,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3620586.6666666665, ans=0.125 2023-11-28 18:32:30,310 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-28 18:32:30,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-28 18:33:07,932 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2050, loss[loss=0.06074, simple_loss=0.089, pruned_loss=0.009957, audio_tagging_loss=0.006286, over 15386.00 frames. ], tot_loss[loss=0.06408, simple_loss=0.08761, pruned_loss=0.01165, audio_tagging_loss=0.008621, over 3045512.30 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:33:27,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3620920.0, ans=0.125 2023-11-28 18:33:32,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-28 18:34:01,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3621120.0, ans=0.125 2023-11-28 18:34:07,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3621120.0, ans=0.125 2023-11-28 18:34:09,459 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2100, loss[loss=0.05612, simple_loss=0.06543, pruned_loss=0.01054, audio_tagging_loss=0.01287, over 13451.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08835, pruned_loss=0.01188, audio_tagging_loss=0.008591, over 3048219.10 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:34:17,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.878e+01 9.444e+01 1.002e+02 1.258e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:34:20,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3621186.6666666665, ans=0.05 2023-11-28 18:34:34,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-28 18:34:38,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3621320.0, ans=0.125 2023-11-28 18:34:58,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3621453.3333333335, ans=0.1 2023-11-28 18:35:12,353 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2150, loss[loss=0.06417, simple_loss=0.08747, pruned_loss=0.01257, audio_tagging_loss=0.007869, over 15855.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08941, pruned_loss=0.01193, audio_tagging_loss=0.008615, over 3048775.82 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:35:29,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3621586.6666666665, ans=0.125 2023-11-28 18:35:36,805 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-28 18:35:49,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3621720.0, ans=0.2 2023-11-28 18:35:50,166 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:36:07,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3621786.6666666665, ans=0.125 2023-11-28 18:36:14,605 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2200, loss[loss=0.06839, simple_loss=0.08603, pruned_loss=0.01469, audio_tagging_loss=0.01068, over 15784.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08964, pruned_loss=0.012, audio_tagging_loss=0.008615, over 3051730.44 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:36:22,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=15.0 2023-11-28 18:36:22,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.070e+01 9.676e+01 1.027e+02 1.399e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 18:36:38,704 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-28 18:36:48,481 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:36:58,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3622053.3333333335, ans=0.05 2023-11-28 18:37:08,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3622120.0, ans=0.125 2023-11-28 18:37:12,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3622120.0, ans=0.0 2023-11-28 18:37:16,416 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2250, loss[loss=0.08779, simple_loss=0.133, pruned_loss=0.01493, audio_tagging_loss=0.006346, over 16086.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09035, pruned_loss=0.01208, audio_tagging_loss=0.008608, over 3051383.40 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:37:24,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3622186.6666666665, ans=0.125 2023-11-28 18:37:40,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3622320.0, ans=0.125 2023-11-28 18:37:41,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-28 18:37:41,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3622320.0, ans=0.125 2023-11-28 18:37:53,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3622386.6666666665, ans=0.125 2023-11-28 18:38:17,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622520.0, ans=0.1 2023-11-28 18:38:17,960 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2300, loss[loss=0.0693, simple_loss=0.09904, pruned_loss=0.01282, audio_tagging_loss=0.006961, over 15406.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09085, pruned_loss=0.01225, audio_tagging_loss=0.008573, over 3054517.78 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:38:26,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.756e+01 9.268e+01 1.034e+02 1.497e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 18:38:39,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3622586.6666666665, ans=0.1 2023-11-28 18:38:41,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3622653.3333333335, ans=0.125 2023-11-28 18:38:42,551 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-28 18:38:42,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3622653.3333333335, ans=0.125 2023-11-28 18:39:14,304 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:39:15,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3622786.6666666665, ans=0.125 2023-11-28 18:39:20,161 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2350, loss[loss=0.06002, simple_loss=0.07928, pruned_loss=0.01185, audio_tagging_loss=0.008525, over 14938.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09038, pruned_loss=0.01214, audio_tagging_loss=0.008637, over 3051127.54 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:39:32,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-28 18:39:32,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-28 18:39:38,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3622920.0, ans=0.125 2023-11-28 18:39:45,249 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-28 18:40:03,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-28 18:40:21,990 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2400, loss[loss=0.06118, simple_loss=0.08643, pruned_loss=0.008641, audio_tagging_loss=0.009327, over 15335.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08921, pruned_loss=0.01193, audio_tagging_loss=0.008813, over 3044097.78 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:40:30,763 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.938e+01 9.455e+01 1.032e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 18:40:34,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3623253.3333333335, ans=0.125 2023-11-28 18:40:39,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-28 18:40:46,801 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-28 18:41:23,602 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2450, loss[loss=0.07972, simple_loss=0.1082, pruned_loss=0.01538, audio_tagging_loss=0.01023, over 15025.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0893, pruned_loss=0.01221, audio_tagging_loss=0.008806, over 3047698.33 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:41:24,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3623520.0, ans=0.125 2023-11-28 18:41:37,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3623586.6666666665, ans=0.125 2023-11-28 18:41:40,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-28 18:41:49,407 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-28 18:41:54,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3623653.3333333335, ans=0.0 2023-11-28 18:42:10,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3623720.0, ans=0.0 2023-11-28 18:42:21,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3623786.6666666665, ans=0.125 2023-11-28 18:42:25,835 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2500, loss[loss=0.06127, simple_loss=0.08365, pruned_loss=0.01351, audio_tagging_loss=0.005935, over 15026.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08934, pruned_loss=0.01223, audio_tagging_loss=0.008862, over 3045343.99 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:42:35,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.803e+01 9.255e+01 1.000e+02 1.311e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 18:42:51,443 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-28 18:43:06,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3624053.3333333335, ans=0.07 2023-11-28 18:43:28,631 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2550, loss[loss=0.0535, simple_loss=0.06882, pruned_loss=0.01018, audio_tagging_loss=0.008912, over 15130.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08891, pruned_loss=0.01218, audio_tagging_loss=0.008819, over 3047795.72 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:43:41,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3624253.3333333335, ans=0.125 2023-11-28 18:43:51,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3624253.3333333335, ans=0.025 2023-11-28 18:43:51,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3624253.3333333335, ans=0.1 2023-11-28 18:43:53,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-28 18:44:01,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3624320.0, ans=0.125 2023-11-28 18:44:05,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3624386.6666666665, ans=0.125 2023-11-28 18:44:24,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3624453.3333333335, ans=0.1 2023-11-28 18:44:25,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-28 18:44:29,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3624520.0, ans=0.0 2023-11-28 18:44:30,685 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2600, loss[loss=0.04348, simple_loss=0.05599, pruned_loss=0.008519, audio_tagging_loss=0.006967, over 14392.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08852, pruned_loss=0.01217, audio_tagging_loss=0.00862, over 3045983.04 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:44:36,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3624520.0, ans=0.125 2023-11-28 18:44:36,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3624520.0, ans=0.2 2023-11-28 18:44:39,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.738e+01 9.385e+01 1.004e+02 1.373e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 18:44:39,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3624520.0, ans=0.1 2023-11-28 18:44:44,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3624586.6666666665, ans=0.125 2023-11-28 18:44:48,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3624586.6666666665, ans=0.0 2023-11-28 18:44:48,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3624586.6666666665, ans=0.125 2023-11-28 18:44:56,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-28 18:45:08,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3624720.0, ans=0.125 2023-11-28 18:45:14,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3624720.0, ans=0.125 2023-11-28 18:45:31,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3624853.3333333335, ans=0.125 2023-11-28 18:45:32,756 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2650, loss[loss=0.05484, simple_loss=0.07716, pruned_loss=0.008532, audio_tagging_loss=0.007726, over 13828.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08901, pruned_loss=0.01209, audio_tagging_loss=0.008573, over 3044435.25 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:45:58,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-28 18:46:33,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-28 18:46:35,467 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2700, loss[loss=0.06856, simple_loss=0.09732, pruned_loss=0.01204, audio_tagging_loss=0.007859, over 15227.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.0891, pruned_loss=0.01219, audio_tagging_loss=0.008577, over 3040755.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:46:35,720 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:46:44,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.009e+01 9.559e+01 1.022e+02 1.303e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 18:46:59,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3625320.0, ans=0.125 2023-11-28 18:47:00,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-28 18:47:12,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3625386.6666666665, ans=0.0 2023-11-28 18:47:17,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-11-28 18:47:37,947 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2750, loss[loss=0.06773, simple_loss=0.0932, pruned_loss=0.01345, audio_tagging_loss=0.007682, over 15369.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08905, pruned_loss=0.01216, audio_tagging_loss=0.008509, over 3048583.44 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:47:38,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3625520.0, ans=0.125 2023-11-28 18:47:40,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3625520.0, ans=0.125 2023-11-28 18:47:49,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2023-11-28 18:47:55,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3625586.6666666665, ans=0.0 2023-11-28 18:48:02,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-28 18:48:32,314 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:48:34,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3625786.6666666665, ans=0.125 2023-11-28 18:48:36,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3625786.6666666665, ans=0.0 2023-11-28 18:48:39,507 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2800, loss[loss=0.07039, simple_loss=0.09501, pruned_loss=0.01428, audio_tagging_loss=0.008601, over 14940.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0894, pruned_loss=0.01223, audio_tagging_loss=0.008498, over 3051234.73 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:48:39,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3625853.3333333335, ans=0.125 2023-11-28 18:48:39,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3625853.3333333335, ans=0.0 2023-11-28 18:48:49,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.943e+01 9.576e+01 1.040e+02 1.629e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 18:48:53,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-28 18:48:55,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3625920.0, ans=0.125 2023-11-28 18:49:05,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-28 18:49:10,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3625986.6666666665, ans=0.125 2023-11-28 18:49:15,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-28 18:49:41,560 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2850, loss[loss=0.06208, simple_loss=0.08715, pruned_loss=0.01035, audio_tagging_loss=0.008155, over 14318.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08975, pruned_loss=0.01236, audio_tagging_loss=0.008492, over 3046697.62 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:49:44,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3626186.6666666665, ans=0.1 2023-11-28 18:49:44,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3626186.6666666665, ans=0.0 2023-11-28 18:50:00,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3626253.3333333335, ans=0.0 2023-11-28 18:50:06,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-28 18:50:36,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2023-11-28 18:50:43,900 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2900, loss[loss=0.06937, simple_loss=0.09303, pruned_loss=0.01267, audio_tagging_loss=0.01019, over 14622.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08939, pruned_loss=0.01223, audio_tagging_loss=0.00855, over 3044429.06 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:50:55,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.790e+01 9.510e+01 1.033e+02 1.199e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 18:51:03,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3626586.6666666665, ans=0.2 2023-11-28 18:51:08,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-28 18:51:10,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3626653.3333333335, ans=0.07 2023-11-28 18:51:15,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3626653.3333333335, ans=0.2 2023-11-28 18:51:28,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3626720.0, ans=0.2 2023-11-28 18:51:40,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2023-11-28 18:51:48,521 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2950, loss[loss=0.05082, simple_loss=0.0661, pruned_loss=0.006087, audio_tagging_loss=0.01168, over 14746.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08991, pruned_loss=0.0122, audio_tagging_loss=0.008608, over 3044581.11 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:51:53,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=12.0 2023-11-28 18:51:57,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3626853.3333333335, ans=0.0 2023-11-28 18:51:58,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3626853.3333333335, ans=0.125 2023-11-28 18:52:13,292 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-28 18:52:21,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3626986.6666666665, ans=0.2 2023-11-28 18:52:24,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3627053.3333333335, ans=0.2 2023-11-28 18:52:41,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627120.0, ans=0.1 2023-11-28 18:52:41,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3627120.0, ans=0.2 2023-11-28 18:52:42,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-28 18:52:45,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3627120.0, ans=0.125 2023-11-28 18:52:50,276 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3000, loss[loss=0.08122, simple_loss=0.1189, pruned_loss=0.01621, audio_tagging_loss=0.005554, over 15619.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08946, pruned_loss=0.01208, audio_tagging_loss=0.008666, over 3052228.93 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:52:50,277 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 18:53:33,223 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05731, simple_loss=0.05055, pruned_loss=0.005328, audio_tagging_loss=0.02671, over 4681554.00 frames. 2023-11-28 18:53:33,224 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 18:53:42,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3627186.6666666665, ans=0.0 2023-11-28 18:53:44,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.011e+01 9.606e+01 1.015e+02 1.587e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 18:53:53,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3627253.3333333335, ans=0.0 2023-11-28 18:53:57,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-28 18:53:58,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3627320.0, ans=0.05 2023-11-28 18:54:02,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3627320.0, ans=0.0 2023-11-28 18:54:10,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3627386.6666666665, ans=0.0 2023-11-28 18:54:34,903 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3050, loss[loss=0.08558, simple_loss=0.1153, pruned_loss=0.02043, audio_tagging_loss=0.007516, over 16414.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08955, pruned_loss=0.01208, audio_tagging_loss=0.008726, over 3052385.99 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:54:36,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3627520.0, ans=0.125 2023-11-28 18:54:51,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3627586.6666666665, ans=0.125 2023-11-28 18:54:59,368 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-28 18:55:13,397 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:55:34,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3627786.6666666665, ans=0.125 2023-11-28 18:55:35,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-11-28 18:55:37,564 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3100, loss[loss=0.08342, simple_loss=0.1205, pruned_loss=0.0141, audio_tagging_loss=0.009071, over 15886.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08948, pruned_loss=0.01199, audio_tagging_loss=0.008785, over 3049056.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:55:39,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3627853.3333333335, ans=0.05 2023-11-28 18:55:48,880 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.039e+01 9.695e+01 1.074e+02 1.445e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:56:03,216 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-28 18:56:09,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3627986.6666666665, ans=0.0 2023-11-28 18:56:12,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3627986.6666666665, ans=0.125 2023-11-28 18:56:15,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3628053.3333333335, ans=0.0 2023-11-28 18:56:33,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3628120.0, ans=0.125 2023-11-28 18:56:39,965 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3150, loss[loss=0.06501, simple_loss=0.08871, pruned_loss=0.01105, audio_tagging_loss=0.009601, over 15248.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09017, pruned_loss=0.01227, audio_tagging_loss=0.008848, over 3051331.64 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:56:43,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-11-28 18:56:49,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-28 18:57:00,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3628253.3333333335, ans=0.125 2023-11-28 18:57:05,159 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-28 18:57:12,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-28 18:57:12,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3628320.0, ans=0.5 2023-11-28 18:57:14,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628320.0, ans=0.125 2023-11-28 18:57:42,612 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3200, loss[loss=0.0647, simple_loss=0.0906, pruned_loss=0.01098, audio_tagging_loss=0.008412, over 16929.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08961, pruned_loss=0.01227, audio_tagging_loss=0.008874, over 3050295.31 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:57:47,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628520.0, ans=0.125 2023-11-28 18:57:49,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3628520.0, ans=0.0 2023-11-28 18:57:52,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.188e+01 9.825e+01 1.034e+02 1.228e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-28 18:57:58,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628586.6666666665, ans=0.1 2023-11-28 18:58:01,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628586.6666666665, ans=0.1 2023-11-28 18:58:06,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-28 18:58:11,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3628653.3333333335, ans=0.025 2023-11-28 18:58:44,569 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3250, loss[loss=0.08442, simple_loss=0.1173, pruned_loss=0.01865, audio_tagging_loss=0.007111, over 15164.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08999, pruned_loss=0.01229, audio_tagging_loss=0.008901, over 3048563.77 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:59:00,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3628920.0, ans=0.125 2023-11-28 18:59:05,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628920.0, ans=0.1 2023-11-28 18:59:09,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-28 18:59:15,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628986.6666666665, ans=0.1 2023-11-28 18:59:16,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3628986.6666666665, ans=0.125 2023-11-28 18:59:19,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3628986.6666666665, ans=0.0 2023-11-28 18:59:19,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-28 18:59:22,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3629053.3333333335, ans=0.0 2023-11-28 18:59:25,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629053.3333333335, ans=0.1 2023-11-28 18:59:46,066 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3300, loss[loss=0.07819, simple_loss=0.1106, pruned_loss=0.01313, audio_tagging_loss=0.009786, over 15020.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09015, pruned_loss=0.01238, audio_tagging_loss=0.008977, over 3048389.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:59:57,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.009e+01 9.919e+01 1.085e+02 1.499e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 19:00:07,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-28 19:00:10,726 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-28 19:00:12,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3629320.0, ans=0.125 2023-11-28 19:00:14,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3629320.0, ans=0.0 2023-11-28 19:00:37,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-28 19:00:42,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2023-11-28 19:00:44,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3629453.3333333335, ans=0.125 2023-11-28 19:00:44,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-28 19:00:48,550 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3350, loss[loss=0.05077, simple_loss=0.06644, pruned_loss=0.009561, audio_tagging_loss=0.007993, over 15393.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08954, pruned_loss=0.01215, audio_tagging_loss=0.008866, over 3051368.20 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:00:48,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3629520.0, ans=0.0 2023-11-28 19:01:06,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=12.0 2023-11-28 19:01:12,619 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-28 19:01:43,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3629786.6666666665, ans=0.2 2023-11-28 19:01:45,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3629786.6666666665, ans=0.125 2023-11-28 19:01:49,486 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3400, loss[loss=0.07184, simple_loss=0.1016, pruned_loss=0.01191, audio_tagging_loss=0.00915, over 16001.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08969, pruned_loss=0.01224, audio_tagging_loss=0.00876, over 3051663.72 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:02:01,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.096e+01 9.800e+01 1.047e+02 1.329e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-28 19:02:10,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-28 19:02:14,682 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-28 19:02:24,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-28 19:02:24,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3629986.6666666665, ans=0.0 2023-11-28 19:02:33,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3630053.3333333335, ans=0.0 2023-11-28 19:02:51,158 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3450, loss[loss=0.06006, simple_loss=0.07583, pruned_loss=0.01247, audio_tagging_loss=0.009677, over 14425.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08987, pruned_loss=0.01205, audio_tagging_loss=0.008577, over 3054807.18 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:03:10,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3630253.3333333335, ans=0.1 2023-11-28 19:03:16,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-28 19:03:19,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3630320.0, ans=0.125 2023-11-28 19:03:19,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-28 19:03:27,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-11-28 19:03:35,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3630386.6666666665, ans=0.125 2023-11-28 19:03:53,798 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3500, loss[loss=0.06026, simple_loss=0.08461, pruned_loss=0.009905, audio_tagging_loss=0.008052, over 15012.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0906, pruned_loss=0.01215, audio_tagging_loss=0.008522, over 3055231.51 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:04:06,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.809e+01 9.584e+01 1.024e+02 1.310e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:04:14,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-28 19:04:18,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-28 19:04:19,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3630653.3333333335, ans=0.1 2023-11-28 19:04:26,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3630653.3333333335, ans=0.125 2023-11-28 19:04:27,704 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:04:27,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3630653.3333333335, ans=0.125 2023-11-28 19:04:32,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.58 vs. limit=10.0 2023-11-28 19:04:38,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.05 vs. limit=10.0 2023-11-28 19:04:56,007 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3550, loss[loss=0.07533, simple_loss=0.1147, pruned_loss=0.01037, audio_tagging_loss=0.007636, over 16100.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09012, pruned_loss=0.0121, audio_tagging_loss=0.008515, over 3055712.15 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:04:58,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3630853.3333333335, ans=0.0 2023-11-28 19:05:04,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3630853.3333333335, ans=0.95 2023-11-28 19:05:20,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-28 19:05:56,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-28 19:05:58,327 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3600, loss[loss=0.05492, simple_loss=0.07246, pruned_loss=0.01012, audio_tagging_loss=0.008563, over 13685.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08911, pruned_loss=0.01181, audio_tagging_loss=0.008552, over 3053732.70 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:06:02,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3631186.6666666665, ans=0.125 2023-11-28 19:06:12,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.905e+01 9.661e+01 1.038e+02 1.227e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:06:18,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3631253.3333333335, ans=0.0 2023-11-28 19:06:19,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3631253.3333333335, ans=0.0 2023-11-28 19:06:23,002 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-28 19:06:37,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3631386.6666666665, ans=0.95 2023-11-28 19:06:56,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3631453.3333333335, ans=0.2 2023-11-28 19:07:00,636 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3650, loss[loss=0.05071, simple_loss=0.06123, pruned_loss=0.009448, audio_tagging_loss=0.01065, over 14414.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08962, pruned_loss=0.01187, audio_tagging_loss=0.008511, over 3051248.56 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:07:15,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3631586.6666666665, ans=0.125 2023-11-28 19:07:17,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3631586.6666666665, ans=0.0 2023-11-28 19:07:19,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3631586.6666666665, ans=0.2 2023-11-28 19:07:25,298 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-28 19:07:26,629 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:07:49,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3631786.6666666665, ans=0.125 2023-11-28 19:08:01,840 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3700, loss[loss=0.0671, simple_loss=0.09421, pruned_loss=0.01338, audio_tagging_loss=0.006622, over 15079.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08968, pruned_loss=0.0119, audio_tagging_loss=0.008348, over 3047230.49 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:08:08,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3631853.3333333335, ans=0.0 2023-11-28 19:08:15,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 9.135e+01 9.668e+01 1.042e+02 1.211e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 19:08:27,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-28 19:08:37,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3631986.6666666665, ans=0.1 2023-11-28 19:08:40,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3632053.3333333335, ans=0.1 2023-11-28 19:08:40,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-28 19:08:42,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-28 19:08:43,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3632053.3333333335, ans=0.2 2023-11-28 19:08:45,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3632053.3333333335, ans=0.0 2023-11-28 19:08:46,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-28 19:09:02,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-28 19:09:05,291 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3750, loss[loss=0.09046, simple_loss=0.1337, pruned_loss=0.018, audio_tagging_loss=0.005596, over 16019.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0903, pruned_loss=0.01213, audio_tagging_loss=0.008412, over 3053478.41 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:09:13,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3632186.6666666665, ans=0.0 2023-11-28 19:09:28,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3632253.3333333335, ans=0.125 2023-11-28 19:09:30,356 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-28 19:09:32,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3632320.0, ans=0.1 2023-11-28 19:09:35,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-28 19:09:42,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632386.6666666665, ans=0.1 2023-11-28 19:09:50,375 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:10:08,025 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3800, loss[loss=0.06396, simple_loss=0.08162, pruned_loss=0.01203, audio_tagging_loss=0.01113, over 14638.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09037, pruned_loss=0.01206, audio_tagging_loss=0.008481, over 3047542.92 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:10:20,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3632586.6666666665, ans=0.2 2023-11-28 19:10:20,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 8.975e+01 9.556e+01 1.041e+02 1.200e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:10:32,435 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-28 19:10:32,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3632653.3333333335, ans=0.125 2023-11-28 19:10:48,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3632720.0, ans=0.125 2023-11-28 19:11:02,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-28 19:11:06,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3632786.6666666665, ans=0.0 2023-11-28 19:11:07,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-28 19:11:08,699 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3850, loss[loss=0.08199, simple_loss=0.1132, pruned_loss=0.01651, audio_tagging_loss=0.008882, over 14274.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09006, pruned_loss=0.0121, audio_tagging_loss=0.008575, over 3039660.70 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:11:09,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2023-11-28 19:11:11,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-28 19:11:17,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-28 19:11:34,453 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-28 19:11:44,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3632986.6666666665, ans=0.025 2023-11-28 19:11:52,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3633053.3333333335, ans=0.1 2023-11-28 19:11:59,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2023-11-28 19:12:10,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3633186.6666666665, ans=0.125 2023-11-28 19:12:11,427 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3900, loss[loss=0.05928, simple_loss=0.06737, pruned_loss=0.0113, audio_tagging_loss=0.01429, over 16692.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08952, pruned_loss=0.01216, audio_tagging_loss=0.008726, over 3044738.19 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:12:11,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=22.5 2023-11-28 19:12:21,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3633186.6666666665, ans=10.0 2023-11-28 19:12:24,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-28 19:12:26,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.937e+01 9.555e+01 1.040e+02 1.282e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:12:31,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3633253.3333333335, ans=0.0 2023-11-28 19:12:36,344 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-28 19:12:39,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-28 19:12:47,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-11-28 19:13:13,848 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3950, loss[loss=0.0679, simple_loss=0.08096, pruned_loss=0.01574, audio_tagging_loss=0.01168, over 14361.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0899, pruned_loss=0.01217, audio_tagging_loss=0.008833, over 3045476.97 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:13:28,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633586.6666666665, ans=0.1 2023-11-28 19:13:38,145 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-28 19:14:15,544 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4000, loss[loss=0.0637, simple_loss=0.08342, pruned_loss=0.01259, audio_tagging_loss=0.009394, over 14902.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08983, pruned_loss=0.01215, audio_tagging_loss=0.008899, over 3043437.29 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:14:18,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633853.3333333335, ans=0.1 2023-11-28 19:14:30,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 9.097e+01 9.894e+01 1.091e+02 1.423e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 19:14:37,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3633920.0, ans=0.125 2023-11-28 19:14:40,515 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-28 19:14:42,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3633986.6666666665, ans=0.125 2023-11-28 19:15:11,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3634120.0, ans=0.125 2023-11-28 19:15:11,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-28 19:15:17,442 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4050, loss[loss=0.05198, simple_loss=0.06952, pruned_loss=0.008789, audio_tagging_loss=0.008428, over 15795.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0898, pruned_loss=0.01218, audio_tagging_loss=0.00892, over 3043680.03 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:15:22,262 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:15:22,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3634186.6666666665, ans=0.125 2023-11-28 19:15:24,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3634186.6666666665, ans=6.0 2023-11-28 19:15:42,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-28 19:15:43,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3634320.0, ans=0.125 2023-11-28 19:15:49,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-28 19:16:05,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3634453.3333333335, ans=0.0 2023-11-28 19:16:16,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3634453.3333333335, ans=0.125 2023-11-28 19:16:19,694 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4100, loss[loss=0.0817, simple_loss=0.1049, pruned_loss=0.02047, audio_tagging_loss=0.008768, over 14702.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09036, pruned_loss=0.01232, audio_tagging_loss=0.008914, over 3042484.83 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:16:27,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3634520.0, ans=0.125 2023-11-28 19:16:34,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 9.021e+01 9.586e+01 1.040e+02 1.361e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:16:43,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-28 19:17:03,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.52 vs. limit=22.5 2023-11-28 19:17:08,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-28 19:17:17,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3634786.6666666665, ans=0.125 2023-11-28 19:17:21,141 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4150, loss[loss=0.06232, simple_loss=0.09356, pruned_loss=0.01049, audio_tagging_loss=0.005053, over 15074.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09029, pruned_loss=0.01229, audio_tagging_loss=0.008707, over 3044166.50 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:17:45,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-28 19:17:45,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-28 19:17:51,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3634986.6666666665, ans=0.125 2023-11-28 19:18:06,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2023-11-28 19:18:08,573 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:18:22,684 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4200, loss[loss=0.05317, simple_loss=0.06627, pruned_loss=0.009931, audio_tagging_loss=0.01011, over 16312.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09026, pruned_loss=0.01229, audio_tagging_loss=0.008586, over 3041495.69 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:18:27,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3635186.6666666665, ans=0.2 2023-11-28 19:18:29,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2023-11-28 19:18:32,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3635186.6666666665, ans=0.035 2023-11-28 19:18:37,258 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.058e+01 9.549e+01 9.941e+01 2.004e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-28 19:18:42,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3635253.3333333335, ans=0.2 2023-11-28 19:18:45,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3635253.3333333335, ans=0.0 2023-11-28 19:18:46,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=15.0 2023-11-28 19:18:48,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-28 19:19:02,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3635386.6666666665, ans=0.125 2023-11-28 19:19:09,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=10.0 2023-11-28 19:19:09,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3635386.6666666665, ans=0.0 2023-11-28 19:19:21,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3635453.3333333335, ans=0.125 2023-11-28 19:19:25,356 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4250, loss[loss=0.03122, simple_loss=0.03406, pruned_loss=0.003989, audio_tagging_loss=0.0102, over 14586.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08996, pruned_loss=0.01228, audio_tagging_loss=0.00854, over 3034889.19 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:19:28,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3635520.0, ans=0.2 2023-11-28 19:19:36,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3635520.0, ans=0.0 2023-11-28 19:19:51,018 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-28 19:20:03,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.0 2023-11-28 19:20:18,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3635786.6666666665, ans=0.0 2023-11-28 19:20:23,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635786.6666666665, ans=0.1 2023-11-28 19:20:28,732 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4300, loss[loss=0.0646, simple_loss=0.08886, pruned_loss=0.01139, audio_tagging_loss=0.008777, over 15117.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08989, pruned_loss=0.01216, audio_tagging_loss=0.008592, over 3036132.09 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:20:33,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635853.3333333335, ans=0.1 2023-11-28 19:20:37,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3635853.3333333335, ans=0.125 2023-11-28 19:20:42,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 9.029e+01 9.603e+01 1.044e+02 1.295e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:20:47,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3635920.0, ans=0.125 2023-11-28 19:20:52,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-28 19:21:11,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3636053.3333333335, ans=0.125 2023-11-28 19:21:23,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3636120.0, ans=0.0 2023-11-28 19:21:29,132 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4350, loss[loss=0.05414, simple_loss=0.06361, pruned_loss=0.01216, audio_tagging_loss=0.01018, over 13922.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09029, pruned_loss=0.01227, audio_tagging_loss=0.008553, over 3031962.17 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:21:54,050 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-28 19:22:25,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636453.3333333335, ans=0.1 2023-11-28 19:22:25,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3636453.3333333335, ans=0.125 2023-11-28 19:22:31,064 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4400, loss[loss=0.06144, simple_loss=0.08334, pruned_loss=0.01234, audio_tagging_loss=0.007429, over 14712.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09083, pruned_loss=0.0124, audio_tagging_loss=0.008495, over 3042338.58 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:22:34,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3636520.0, ans=0.125 2023-11-28 19:22:46,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.163e+01 9.666e+01 1.055e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 19:22:55,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3636653.3333333335, ans=0.125 2023-11-28 19:22:56,131 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-28 19:23:03,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2023-11-28 19:23:08,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3636720.0, ans=0.0 2023-11-28 19:23:17,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2023-11-28 19:23:18,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3636720.0, ans=10.0 2023-11-28 19:23:18,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3636720.0, ans=0.125 2023-11-28 19:23:23,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3636786.6666666665, ans=0.125 2023-11-28 19:23:31,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636786.6666666665, ans=0.125 2023-11-28 19:23:33,501 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4450, loss[loss=0.0553, simple_loss=0.07231, pruned_loss=0.0134, audio_tagging_loss=0.005749, over 14877.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09035, pruned_loss=0.01209, audio_tagging_loss=0.008426, over 3049668.27 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:23:58,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-28 19:24:08,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3636986.6666666665, ans=0.1 2023-11-28 19:24:35,784 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4500, loss[loss=0.07074, simple_loss=0.1081, pruned_loss=0.01103, audio_tagging_loss=0.005685, over 15602.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09023, pruned_loss=0.01215, audio_tagging_loss=0.008459, over 3043890.54 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:24:50,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-28 19:24:50,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.752e+01 9.380e+01 1.023e+02 1.206e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 19:24:50,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3637253.3333333335, ans=0.1 2023-11-28 19:25:00,923 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-28 19:25:14,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3637386.6666666665, ans=0.1 2023-11-28 19:25:21,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3637386.6666666665, ans=0.125 2023-11-28 19:25:35,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3637453.3333333335, ans=0.2 2023-11-28 19:25:38,411 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4550, loss[loss=0.07455, simple_loss=0.1156, pruned_loss=0.01234, audio_tagging_loss=0.004395, over 15471.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08989, pruned_loss=0.01215, audio_tagging_loss=0.008493, over 3036144.88 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:26:03,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-28 19:26:09,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3637653.3333333335, ans=0.125 2023-11-28 19:26:11,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=22.5 2023-11-28 19:26:24,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3637720.0, ans=0.125 2023-11-28 19:26:28,093 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:26:40,923 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4600, loss[loss=0.08233, simple_loss=0.1067, pruned_loss=0.02163, audio_tagging_loss=0.00734, over 17306.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08998, pruned_loss=0.01225, audio_tagging_loss=0.008491, over 3040836.26 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:26:55,385 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.776e+01 9.447e+01 1.031e+02 1.407e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 19:27:05,287 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-28 19:27:06,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3637986.6666666665, ans=0.1 2023-11-28 19:27:15,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-28 19:27:28,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3638120.0, ans=0.2 2023-11-28 19:27:42,007 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4650, loss[loss=0.06433, simple_loss=0.08851, pruned_loss=0.01287, audio_tagging_loss=0.007205, over 14260.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08906, pruned_loss=0.01218, audio_tagging_loss=0.008709, over 3045626.06 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:27:48,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3638186.6666666665, ans=0.0 2023-11-28 19:27:50,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-28 19:28:00,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3638253.3333333335, ans=0.125 2023-11-28 19:28:05,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3638320.0, ans=0.125 2023-11-28 19:28:06,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-28 19:28:06,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3638320.0, ans=0.2 2023-11-28 19:28:14,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3638320.0, ans=0.0 2023-11-28 19:28:22,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-28 19:28:34,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3638453.3333333335, ans=0.1 2023-11-28 19:28:44,185 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4700, loss[loss=0.07813, simple_loss=0.1047, pruned_loss=0.01672, audio_tagging_loss=0.009078, over 15474.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09011, pruned_loss=0.01225, audio_tagging_loss=0.008713, over 3046983.23 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:28:50,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3638520.0, ans=0.1 2023-11-28 19:28:50,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3638520.0, ans=0.0 2023-11-28 19:28:51,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3638520.0, ans=0.125 2023-11-28 19:28:59,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3638586.6666666665, ans=0.0 2023-11-28 19:29:00,543 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 9.174e+01 9.774e+01 1.029e+02 1.399e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-28 19:29:09,097 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-28 19:29:36,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-28 19:29:38,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3638786.6666666665, ans=0.035 2023-11-28 19:29:39,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-28 19:29:40,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-28 19:29:42,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3638786.6666666665, ans=0.2 2023-11-28 19:29:47,371 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4750, loss[loss=0.06808, simple_loss=0.09437, pruned_loss=0.01134, audio_tagging_loss=0.009558, over 15266.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09007, pruned_loss=0.01228, audio_tagging_loss=0.008718, over 3046771.32 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:29:48,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-28 19:29:55,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3638853.3333333335, ans=0.0 2023-11-28 19:30:05,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-28 19:30:07,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-28 19:30:09,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3638920.0, ans=0.125 2023-11-28 19:30:11,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-28 19:30:17,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3638986.6666666665, ans=0.0 2023-11-28 19:30:33,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3639053.3333333335, ans=0.125 2023-11-28 19:30:48,668 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4800, loss[loss=0.07162, simple_loss=0.1006, pruned_loss=0.01373, audio_tagging_loss=0.007575, over 16864.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.0899, pruned_loss=0.01212, audio_tagging_loss=0.008798, over 3043940.52 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:30:56,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3639186.6666666665, ans=0.125 2023-11-28 19:31:05,143 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.815e+01 9.365e+01 1.036e+02 1.386e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 19:31:08,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2023-11-28 19:31:14,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-28 19:31:27,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.23 vs. limit=10.0 2023-11-28 19:31:34,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3639386.6666666665, ans=0.0 2023-11-28 19:31:40,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3639453.3333333335, ans=0.125 2023-11-28 19:31:43,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2023-11-28 19:31:51,022 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4850, loss[loss=0.07468, simple_loss=0.1017, pruned_loss=0.01379, audio_tagging_loss=0.01004, over 14755.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09018, pruned_loss=0.0121, audio_tagging_loss=0.008821, over 3049249.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:31:51,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3639520.0, ans=0.125 2023-11-28 19:31:55,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3639520.0, ans=0.0 2023-11-28 19:32:04,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3639586.6666666665, ans=0.0 2023-11-28 19:32:15,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-28 19:32:15,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3639653.3333333335, ans=0.0 2023-11-28 19:32:27,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3639720.0, ans=0.0 2023-11-28 19:32:29,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-11-28 19:32:39,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2023-11-28 19:32:52,919 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4900, loss[loss=0.05926, simple_loss=0.08373, pruned_loss=0.008948, audio_tagging_loss=0.008442, over 15859.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08967, pruned_loss=0.01191, audio_tagging_loss=0.008792, over 3049525.10 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:32:57,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3639853.3333333335, ans=0.2 2023-11-28 19:33:03,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3639853.3333333335, ans=0.1 2023-11-28 19:33:04,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3639920.0, ans=0.0 2023-11-28 19:33:10,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.839e+01 9.451e+01 1.014e+02 1.484e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 19:33:13,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3639920.0, ans=0.0 2023-11-28 19:33:17,179 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-28 19:33:21,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3639986.6666666665, ans=0.125 2023-11-28 19:33:36,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3640053.3333333335, ans=0.2 2023-11-28 19:33:49,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3640120.0, ans=0.125 2023-11-28 19:33:54,876 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4950, loss[loss=0.05751, simple_loss=0.0827, pruned_loss=0.009052, audio_tagging_loss=0.007111, over 17066.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08913, pruned_loss=0.01188, audio_tagging_loss=0.008627, over 3042789.80 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:33:58,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3640186.6666666665, ans=0.125 2023-11-28 19:33:59,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640186.6666666665, ans=0.1 2023-11-28 19:34:05,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-28 19:34:12,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-28 19:34:19,466 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-28 19:34:28,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3640320.0, ans=0.125 2023-11-28 19:34:37,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3640386.6666666665, ans=0.2 2023-11-28 19:34:37,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-11-28 19:34:38,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3640386.6666666665, ans=0.125 2023-11-28 19:34:42,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-28 19:34:53,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3640453.3333333335, ans=0.0 2023-11-28 19:34:53,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3640453.3333333335, ans=0.125 2023-11-28 19:34:55,713 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5000, loss[loss=0.08757, simple_loss=0.119, pruned_loss=0.01806, audio_tagging_loss=0.009999, over 15549.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08945, pruned_loss=0.01204, audio_tagging_loss=0.008561, over 3036633.01 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:34:58,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3640520.0, ans=0.125 2023-11-28 19:35:00,094 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:35:13,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.947e+01 9.568e+01 1.019e+02 1.168e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:35:21,195 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-28 19:35:41,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-28 19:35:49,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3640786.6666666665, ans=0.04949747468305833 2023-11-28 19:35:58,095 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5050, loss[loss=0.06698, simple_loss=0.09436, pruned_loss=0.01307, audio_tagging_loss=0.006726, over 15864.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08853, pruned_loss=0.01197, audio_tagging_loss=0.008565, over 3038822.56 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:36:22,310 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-28 19:36:25,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3640986.6666666665, ans=15.0 2023-11-28 19:36:49,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3641120.0, ans=0.025 2023-11-28 19:36:52,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3641120.0, ans=0.0 2023-11-28 19:37:00,156 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5100, loss[loss=0.06159, simple_loss=0.08275, pruned_loss=0.01113, audio_tagging_loss=0.009092, over 16187.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08765, pruned_loss=0.01178, audio_tagging_loss=0.008528, over 3035639.37 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:37:01,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641186.6666666665, ans=0.1 2023-11-28 19:37:14,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3641253.3333333335, ans=0.0 2023-11-28 19:37:17,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.885e+01 9.689e+01 1.021e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 19:37:22,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2023-11-28 19:37:24,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-28 19:37:25,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-28 19:37:31,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3641320.0, ans=0.125 2023-11-28 19:37:44,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-28 19:37:56,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3641453.3333333335, ans=0.125 2023-11-28 19:37:57,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-28 19:37:59,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2023-11-28 19:37:59,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2023-11-28 19:38:00,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641520.0, ans=0.1 2023-11-28 19:38:01,238 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5150, loss[loss=0.06178, simple_loss=0.08601, pruned_loss=0.01088, audio_tagging_loss=0.007898, over 16656.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.0881, pruned_loss=0.01182, audio_tagging_loss=0.008565, over 3040536.07 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:38:02,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3641520.0, ans=0.1 2023-11-28 19:38:12,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3641520.0, ans=0.125 2023-11-28 19:38:20,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3641586.6666666665, ans=0.0 2023-11-28 19:38:27,206 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-28 19:38:27,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641653.3333333335, ans=0.1 2023-11-28 19:38:34,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3641653.3333333335, ans=0.125 2023-11-28 19:38:41,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3641720.0, ans=0.0 2023-11-28 19:38:51,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3641786.6666666665, ans=0.95 2023-11-28 19:38:56,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3641786.6666666665, ans=0.0 2023-11-28 19:39:04,428 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5200, loss[loss=0.07489, simple_loss=0.1077, pruned_loss=0.01626, audio_tagging_loss=0.004787, over 14956.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08992, pruned_loss=0.01217, audio_tagging_loss=0.008438, over 3038897.29 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:39:10,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-28 19:39:13,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-28 19:39:15,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3641920.0, ans=0.0 2023-11-28 19:39:22,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 9.099e+01 9.660e+01 1.024e+02 1.324e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:39:28,797 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-28 19:40:00,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-28 19:40:06,038 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5250, loss[loss=0.06256, simple_loss=0.09563, pruned_loss=0.009957, audio_tagging_loss=0.004786, over 15581.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08995, pruned_loss=0.01225, audio_tagging_loss=0.008431, over 3044045.96 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:40:29,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3642320.0, ans=0.0 2023-11-28 19:40:30,098 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-28 19:40:43,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3642386.6666666665, ans=0.0 2023-11-28 19:40:43,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3642386.6666666665, ans=0.04949747468305833 2023-11-28 19:40:56,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3642453.3333333335, ans=0.125 2023-11-28 19:41:01,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3642453.3333333335, ans=0.05 2023-11-28 19:41:06,880 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5300, loss[loss=0.06033, simple_loss=0.07972, pruned_loss=0.01174, audio_tagging_loss=0.008721, over 16559.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08913, pruned_loss=0.01213, audio_tagging_loss=0.008432, over 3043069.20 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:41:10,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3642520.0, ans=0.2 2023-11-28 19:41:11,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3642520.0, ans=0.0 2023-11-28 19:41:11,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3642520.0, ans=0.125 2023-11-28 19:41:14,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3642520.0, ans=0.125 2023-11-28 19:41:17,006 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:41:17,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-28 19:41:23,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.67 vs. limit=22.5 2023-11-28 19:41:24,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3642586.6666666665, ans=0.125 2023-11-28 19:41:25,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 9.027e+01 9.738e+01 1.069e+02 1.273e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 19:41:25,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3642586.6666666665, ans=0.125 2023-11-28 19:41:28,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2023-11-28 19:41:29,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2023-11-28 19:41:31,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-28 19:41:43,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3642720.0, ans=0.025 2023-11-28 19:41:51,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3642720.0, ans=0.0 2023-11-28 19:41:55,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-28 19:42:08,135 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5350, loss[loss=0.07305, simple_loss=0.1009, pruned_loss=0.01331, audio_tagging_loss=0.009311, over 13834.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08961, pruned_loss=0.01216, audio_tagging_loss=0.008501, over 3046003.06 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:42:18,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3642853.3333333335, ans=0.2 2023-11-28 19:42:23,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3642920.0, ans=0.1 2023-11-28 19:42:33,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-28 19:42:48,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-28 19:42:58,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3643120.0, ans=0.125 2023-11-28 19:42:59,391 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:43:03,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3643120.0, ans=0.125 2023-11-28 19:43:10,540 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5400, loss[loss=0.06735, simple_loss=0.09104, pruned_loss=0.01126, audio_tagging_loss=0.01057, over 16112.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08934, pruned_loss=0.0122, audio_tagging_loss=0.008685, over 3046256.91 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:43:17,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3643186.6666666665, ans=0.0 2023-11-28 19:43:18,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3643186.6666666665, ans=0.125 2023-11-28 19:43:22,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3643253.3333333335, ans=0.0 2023-11-28 19:43:28,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 9.157e+01 9.833e+01 1.043e+02 1.444e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 19:43:30,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-28 19:43:33,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=12.0 2023-11-28 19:43:34,954 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-28 19:43:38,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3643320.0, ans=0.0 2023-11-28 19:44:01,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3643453.3333333335, ans=0.0 2023-11-28 19:44:12,576 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5450, loss[loss=0.06945, simple_loss=0.08939, pruned_loss=0.01625, audio_tagging_loss=0.008508, over 15459.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.0896, pruned_loss=0.01235, audio_tagging_loss=0.008736, over 3048048.60 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:44:19,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2023-11-28 19:44:37,501 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-28 19:44:53,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2023-11-28 19:44:55,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-11-28 19:44:59,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-11-28 19:45:02,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3643786.6666666665, ans=0.0 2023-11-28 19:45:06,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-11-28 19:45:14,837 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5500, loss[loss=0.04547, simple_loss=0.062, pruned_loss=0.00494, audio_tagging_loss=0.009531, over 15344.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08963, pruned_loss=0.01232, audio_tagging_loss=0.008712, over 3049582.66 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:45:22,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-28 19:45:26,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3643920.0, ans=0.0 2023-11-28 19:45:31,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3643920.0, ans=0.2 2023-11-28 19:45:34,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.789e+01 9.570e+01 1.015e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:45:40,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-28 19:45:44,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3643986.6666666665, ans=0.0 2023-11-28 19:46:17,732 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5550, loss[loss=0.0755, simple_loss=0.1071, pruned_loss=0.01361, audio_tagging_loss=0.008349, over 15802.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08927, pruned_loss=0.01218, audio_tagging_loss=0.008808, over 3040392.18 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:46:30,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3644253.3333333335, ans=0.0 2023-11-28 19:46:41,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-28 19:46:46,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3644320.0, ans=0.125 2023-11-28 19:46:56,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3644386.6666666665, ans=0.2 2023-11-28 19:47:06,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-28 19:47:07,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2023-11-28 19:47:11,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3644453.3333333335, ans=0.05 2023-11-28 19:47:18,556 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5600, loss[loss=0.08154, simple_loss=0.1105, pruned_loss=0.01794, audio_tagging_loss=0.00835, over 15529.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09004, pruned_loss=0.01223, audio_tagging_loss=0.008915, over 3047191.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:47:22,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-28 19:47:26,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2023-11-28 19:47:38,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.943e+01 9.603e+01 1.015e+02 1.818e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:47:43,575 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-28 19:48:05,759 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:48:13,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3644786.6666666665, ans=0.0 2023-11-28 19:48:13,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2023-11-28 19:48:20,627 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5650, loss[loss=0.0599, simple_loss=0.07245, pruned_loss=0.01477, audio_tagging_loss=0.008906, over 15730.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08968, pruned_loss=0.01219, audio_tagging_loss=0.00898, over 3049641.33 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:48:24,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-28 19:48:26,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3644853.3333333335, ans=0.1 2023-11-28 19:48:30,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3644853.3333333335, ans=0.07 2023-11-28 19:48:39,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-28 19:48:45,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-28 19:49:04,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3645053.3333333335, ans=0.2 2023-11-28 19:49:21,866 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5700, loss[loss=0.06459, simple_loss=0.0913, pruned_loss=0.01026, audio_tagging_loss=0.008682, over 15485.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08892, pruned_loss=0.01193, audio_tagging_loss=0.00896, over 3047260.64 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:49:37,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3645253.3333333335, ans=0.125 2023-11-28 19:49:41,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.661e+01 9.434e+01 1.006e+02 1.407e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 19:49:42,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-28 19:49:46,699 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-28 19:49:48,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3645320.0, ans=0.1 2023-11-28 19:49:48,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3645320.0, ans=0.125 2023-11-28 19:49:52,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3645320.0, ans=0.1 2023-11-28 19:50:06,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3645386.6666666665, ans=0.0 2023-11-28 19:50:06,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-28 19:50:24,573 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5750, loss[loss=0.08875, simple_loss=0.1325, pruned_loss=0.01766, audio_tagging_loss=0.004817, over 16070.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08875, pruned_loss=0.01203, audio_tagging_loss=0.008797, over 3051658.53 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:50:34,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3645520.0, ans=0.125 2023-11-28 19:50:44,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3645586.6666666665, ans=0.0 2023-11-28 19:50:48,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-28 19:50:49,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-28 19:51:15,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3645786.6666666665, ans=0.125 2023-11-28 19:51:26,957 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5800, loss[loss=0.06455, simple_loss=0.08628, pruned_loss=0.01051, audio_tagging_loss=0.0109, over 15017.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08887, pruned_loss=0.01198, audio_tagging_loss=0.008709, over 3052776.43 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:51:41,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3645920.0, ans=0.0 2023-11-28 19:51:46,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.935e+01 9.736e+01 1.038e+02 1.216e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 19:51:51,503 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-28 19:51:54,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2023-11-28 19:51:55,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3645986.6666666665, ans=0.2 2023-11-28 19:51:55,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-28 19:52:09,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3646053.3333333335, ans=0.125 2023-11-28 19:52:10,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-28 19:52:13,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3646053.3333333335, ans=0.2 2023-11-28 19:52:18,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.11 vs. limit=6.0 2023-11-28 19:52:28,727 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5850, loss[loss=0.06704, simple_loss=0.09682, pruned_loss=0.0121, audio_tagging_loss=0.006523, over 15316.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08904, pruned_loss=0.01203, audio_tagging_loss=0.008696, over 3055077.98 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:52:31,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3646186.6666666665, ans=0.125 2023-11-28 19:52:42,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3646253.3333333335, ans=0.125 2023-11-28 19:52:48,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3646253.3333333335, ans=0.125 2023-11-28 19:52:53,846 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-28 19:53:30,695 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5900, loss[loss=0.05644, simple_loss=0.07714, pruned_loss=0.007924, audio_tagging_loss=0.009946, over 15032.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08967, pruned_loss=0.01215, audio_tagging_loss=0.008533, over 3048719.39 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:53:32,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3646520.0, ans=0.125 2023-11-28 19:53:38,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3646520.0, ans=0.125 2023-11-28 19:53:50,822 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.034e+01 9.568e+01 1.043e+02 1.665e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:53:51,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3646586.6666666665, ans=0.125 2023-11-28 19:53:54,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3646653.3333333335, ans=0.1 2023-11-28 19:53:55,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-28 19:53:57,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2023-11-28 19:54:16,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646720.0, ans=0.1 2023-11-28 19:54:21,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3646786.6666666665, ans=0.0 2023-11-28 19:54:26,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3646786.6666666665, ans=0.125 2023-11-28 19:54:33,336 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5950, loss[loss=0.0673, simple_loss=0.09198, pruned_loss=0.013, audio_tagging_loss=0.008315, over 15735.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08892, pruned_loss=0.01203, audio_tagging_loss=0.008589, over 3048420.31 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:54:40,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=22.5 2023-11-28 19:54:41,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646853.3333333335, ans=0.1 2023-11-28 19:54:50,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3646920.0, ans=0.0 2023-11-28 19:54:58,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-28 19:55:00,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3646986.6666666665, ans=0.125 2023-11-28 19:55:20,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3647053.3333333335, ans=15.0 2023-11-28 19:55:29,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3647120.0, ans=0.035 2023-11-28 19:55:35,008 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6000, loss[loss=0.06131, simple_loss=0.07716, pruned_loss=0.01309, audio_tagging_loss=0.009646, over 15253.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08832, pruned_loss=0.01199, audio_tagging_loss=0.008579, over 3045214.57 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:55:35,009 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 19:56:10,939 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7954, 5.0001, 5.0724, 4.9274], device='cuda:3') 2023-11-28 19:56:14,879 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05742, simple_loss=0.05049, pruned_loss=0.005198, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-28 19:56:14,880 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 19:56:27,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3647253.3333333335, ans=0.125 2023-11-28 19:56:34,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.785e+01 9.507e+01 1.045e+02 2.026e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 19:56:39,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-28 19:56:51,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3647386.6666666665, ans=0.125 2023-11-28 19:56:54,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-28 19:57:01,552 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:57:02,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=12.0 2023-11-28 19:57:16,622 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6050, loss[loss=0.06362, simple_loss=0.08046, pruned_loss=0.01304, audio_tagging_loss=0.01035, over 15663.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0882, pruned_loss=0.01205, audio_tagging_loss=0.008608, over 3051659.44 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:57:36,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3647586.6666666665, ans=0.2 2023-11-28 19:57:41,457 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-28 19:57:42,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3647653.3333333335, ans=0.125 2023-11-28 19:58:01,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3647720.0, ans=0.125 2023-11-28 19:58:06,339 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:58:15,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3647786.6666666665, ans=0.125 2023-11-28 19:58:18,271 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6100, loss[loss=0.08068, simple_loss=0.1055, pruned_loss=0.01925, audio_tagging_loss=0.008694, over 14329.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08934, pruned_loss=0.01217, audio_tagging_loss=0.008566, over 3051013.56 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:58:32,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=3647920.0, ans=15.0 2023-11-28 19:58:39,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.974e+01 9.540e+01 1.034e+02 1.321e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 19:58:42,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-28 19:58:43,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3647986.6666666665, ans=0.125 2023-11-28 19:59:03,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3648053.3333333335, ans=0.125 2023-11-28 19:59:11,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2023-11-28 19:59:20,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-28 19:59:20,934 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6150, loss[loss=0.06518, simple_loss=0.08772, pruned_loss=0.01228, audio_tagging_loss=0.009044, over 15481.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08904, pruned_loss=0.01215, audio_tagging_loss=0.008691, over 3056299.65 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:59:35,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3648253.3333333335, ans=0.125 2023-11-28 19:59:37,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3648253.3333333335, ans=0.0 2023-11-28 19:59:40,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3648253.3333333335, ans=0.125 2023-11-28 19:59:41,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3648253.3333333335, ans=0.0 2023-11-28 19:59:45,991 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-28 19:59:46,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3648320.0, ans=0.125 2023-11-28 19:59:53,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3648320.0, ans=0.125 2023-11-28 20:00:21,941 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6200, loss[loss=0.07432, simple_loss=0.102, pruned_loss=0.01506, audio_tagging_loss=0.00827, over 14973.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08884, pruned_loss=0.01211, audio_tagging_loss=0.008764, over 3054316.99 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:00:27,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3648520.0, ans=0.0 2023-11-28 20:00:29,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-11-28 20:00:41,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3648586.6666666665, ans=0.125 2023-11-28 20:00:42,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.739e+01 9.716e+01 1.027e+02 1.338e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-28 20:00:47,111 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-28 20:00:48,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2023-11-28 20:01:01,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3648720.0, ans=0.0 2023-11-28 20:01:03,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3648720.0, ans=0.125 2023-11-28 20:01:18,730 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:01:23,797 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6250, loss[loss=0.08688, simple_loss=0.1228, pruned_loss=0.01654, audio_tagging_loss=0.008932, over 16262.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08927, pruned_loss=0.01207, audio_tagging_loss=0.008777, over 3058028.71 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:01:44,270 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:01:47,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-28 20:02:00,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3649053.3333333335, ans=0.1 2023-11-28 20:02:05,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3649053.3333333335, ans=0.0 2023-11-28 20:02:06,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-28 20:02:06,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3649053.3333333335, ans=0.125 2023-11-28 20:02:25,157 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6300, loss[loss=0.04867, simple_loss=0.0661, pruned_loss=0.008005, audio_tagging_loss=0.007616, over 14308.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08863, pruned_loss=0.01191, audio_tagging_loss=0.00887, over 3056881.42 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:02:34,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-11-28 20:02:44,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3649253.3333333335, ans=0.0 2023-11-28 20:02:45,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.912e+01 9.440e+01 1.014e+02 1.345e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 20:02:49,502 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-28 20:03:26,070 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6350, loss[loss=0.06301, simple_loss=0.08738, pruned_loss=0.009875, audio_tagging_loss=0.00944, over 15132.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08818, pruned_loss=0.01185, audio_tagging_loss=0.00903, over 3055959.64 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:03:32,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3649520.0, ans=0.125 2023-11-28 20:03:52,118 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-28 20:04:09,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3649720.0, ans=0.125 2023-11-28 20:04:16,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3649786.6666666665, ans=0.125 2023-11-28 20:04:19,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3649786.6666666665, ans=0.0 2023-11-28 20:04:28,181 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6400, loss[loss=0.08597, simple_loss=0.1252, pruned_loss=0.01701, audio_tagging_loss=0.006369, over 15752.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08944, pruned_loss=0.01204, audio_tagging_loss=0.008937, over 3052305.35 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:04:30,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-28 20:04:33,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3649853.3333333335, ans=0.0 2023-11-28 20:04:34,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649853.3333333335, ans=0.1 2023-11-28 20:04:35,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649853.3333333335, ans=0.1 2023-11-28 20:04:39,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3649853.3333333335, ans=0.125 2023-11-28 20:04:49,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.001e+01 9.641e+01 1.036e+02 1.339e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 20:04:52,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-28 20:05:05,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2023-11-28 20:05:22,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3650120.0, ans=0.09899494936611666 2023-11-28 20:05:24,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3650120.0, ans=0.0 2023-11-28 20:05:30,121 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6450, loss[loss=0.07628, simple_loss=0.1025, pruned_loss=0.01639, audio_tagging_loss=0.008625, over 14135.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08959, pruned_loss=0.0121, audio_tagging_loss=0.00904, over 3052892.36 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:05:42,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3650253.3333333335, ans=0.015 2023-11-28 20:05:45,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3650253.3333333335, ans=0.125 2023-11-28 20:05:53,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3650320.0, ans=0.0 2023-11-28 20:05:54,433 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-28 20:06:11,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3650386.6666666665, ans=0.125 2023-11-28 20:06:26,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=12.0 2023-11-28 20:06:27,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3650453.3333333335, ans=0.0 2023-11-28 20:06:28,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3650453.3333333335, ans=0.5 2023-11-28 20:06:30,885 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6500, loss[loss=0.05236, simple_loss=0.07022, pruned_loss=0.006332, audio_tagging_loss=0.01092, over 15609.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08905, pruned_loss=0.01192, audio_tagging_loss=0.009081, over 3047747.45 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:06:31,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3650520.0, ans=0.0 2023-11-28 20:06:54,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.749e+01 9.341e+01 9.951e+01 1.412e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 20:06:54,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-28 20:06:55,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3650653.3333333335, ans=0.2 2023-11-28 20:06:56,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-28 20:07:08,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3650720.0, ans=0.2 2023-11-28 20:07:14,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3650720.0, ans=0.125 2023-11-28 20:07:33,354 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6550, loss[loss=0.07549, simple_loss=0.112, pruned_loss=0.01402, audio_tagging_loss=0.005463, over 14600.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08906, pruned_loss=0.01202, audio_tagging_loss=0.008894, over 3046497.01 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:07:58,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-28 20:07:59,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3650986.6666666665, ans=0.125 2023-11-28 20:08:29,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3651120.0, ans=0.1 2023-11-28 20:08:35,799 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6600, loss[loss=0.06359, simple_loss=0.09463, pruned_loss=0.01026, audio_tagging_loss=0.006014, over 14899.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08903, pruned_loss=0.01213, audio_tagging_loss=0.008729, over 3043600.22 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:08:47,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651253.3333333335, ans=0.1 2023-11-28 20:08:56,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3651253.3333333335, ans=0.0 2023-11-28 20:08:58,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.944e+01 9.598e+01 1.039e+02 1.454e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 20:09:01,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-28 20:09:01,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3651320.0, ans=0.125 2023-11-28 20:09:28,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=15.0 2023-11-28 20:09:29,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3651453.3333333335, ans=0.2 2023-11-28 20:09:29,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3651453.3333333335, ans=0.0 2023-11-28 20:09:30,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3651453.3333333335, ans=0.015 2023-11-28 20:09:38,385 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6650, loss[loss=0.05527, simple_loss=0.07604, pruned_loss=0.008739, audio_tagging_loss=0.008506, over 16374.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08813, pruned_loss=0.01184, audio_tagging_loss=0.008687, over 3037980.99 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:09:40,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-28 20:09:47,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3651520.0, ans=0.2 2023-11-28 20:09:53,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3651586.6666666665, ans=0.125 2023-11-28 20:10:03,137 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-28 20:10:12,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3651653.3333333335, ans=0.125 2023-11-28 20:10:22,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3651720.0, ans=0.125 2023-11-28 20:10:34,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3651786.6666666665, ans=0.0 2023-11-28 20:10:39,456 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6700, loss[loss=0.06199, simple_loss=0.08214, pruned_loss=0.009263, audio_tagging_loss=0.01166, over 15112.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0885, pruned_loss=0.01181, audio_tagging_loss=0.008606, over 3039774.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:10:51,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3651920.0, ans=10.0 2023-11-28 20:10:58,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651920.0, ans=0.1 2023-11-28 20:11:02,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.565e+01 9.256e+01 9.960e+01 1.372e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 20:11:04,733 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-28 20:11:27,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3652053.3333333335, ans=0.125 2023-11-28 20:11:33,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3652120.0, ans=0.0 2023-11-28 20:11:38,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652120.0, ans=0.1 2023-11-28 20:11:42,287 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6750, loss[loss=0.07249, simple_loss=0.1081, pruned_loss=0.01232, audio_tagging_loss=0.006098, over 14752.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08837, pruned_loss=0.01183, audio_tagging_loss=0.008635, over 3038865.78 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:11:47,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3652186.6666666665, ans=0.2 2023-11-28 20:11:48,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3652186.6666666665, ans=0.125 2023-11-28 20:11:49,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-11-28 20:11:59,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3652253.3333333335, ans=0.0 2023-11-28 20:12:06,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-28 20:12:41,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3652453.3333333335, ans=0.125 2023-11-28 20:12:43,515 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6800, loss[loss=0.05176, simple_loss=0.07354, pruned_loss=0.006183, audio_tagging_loss=0.008809, over 14789.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08935, pruned_loss=0.01201, audio_tagging_loss=0.008561, over 3030987.34 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:12:48,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3652520.0, ans=0.0 2023-11-28 20:12:49,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3652520.0, ans=0.125 2023-11-28 20:12:55,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3652586.6666666665, ans=0.0 2023-11-28 20:13:05,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.847e+01 9.241e+01 1.022e+02 1.257e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 20:13:07,811 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-28 20:13:19,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652720.0, ans=0.1 2023-11-28 20:13:31,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3652786.6666666665, ans=0.0 2023-11-28 20:13:44,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.30 vs. limit=22.5 2023-11-28 20:13:45,135 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6850, loss[loss=0.07414, simple_loss=0.09449, pruned_loss=0.01573, audio_tagging_loss=0.01116, over 14490.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.0889, pruned_loss=0.01198, audio_tagging_loss=0.008549, over 3033861.17 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:13:46,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=22.5 2023-11-28 20:13:50,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3652853.3333333335, ans=0.07 2023-11-28 20:14:10,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-28 20:14:31,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-28 20:14:39,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3653120.0, ans=0.125 2023-11-28 20:14:43,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-11-28 20:14:44,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-28 20:14:46,443 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6900, loss[loss=0.06328, simple_loss=0.07965, pruned_loss=0.009731, audio_tagging_loss=0.01372, over 15870.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08977, pruned_loss=0.01214, audio_tagging_loss=0.00844, over 3047327.27 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:15:01,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3653253.3333333335, ans=0.125 2023-11-28 20:15:09,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.041e+01 9.811e+01 1.026e+02 3.153e+02, threshold=1.962e+02, percent-clipped=1.0 2023-11-28 20:15:09,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3653320.0, ans=0.125 2023-11-28 20:15:11,015 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-28 20:15:11,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3653320.0, ans=0.2 2023-11-28 20:15:16,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3653320.0, ans=0.0 2023-11-28 20:15:35,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3653386.6666666665, ans=0.125 2023-11-28 20:15:37,478 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:15:39,537 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:15:42,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3653453.3333333335, ans=0.125 2023-11-28 20:15:49,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3653520.0, ans=0.125 2023-11-28 20:15:50,583 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6950, loss[loss=0.05155, simple_loss=0.06729, pruned_loss=0.009815, audio_tagging_loss=0.00809, over 14686.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08972, pruned_loss=0.01197, audio_tagging_loss=0.008492, over 3048922.41 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:16:14,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-28 20:16:22,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3653653.3333333335, ans=0.1 2023-11-28 20:16:24,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3653653.3333333335, ans=0.1 2023-11-28 20:16:36,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3653720.0, ans=0.2 2023-11-28 20:16:51,959 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7000, loss[loss=0.05518, simple_loss=0.0737, pruned_loss=0.01071, audio_tagging_loss=0.007618, over 16147.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09014, pruned_loss=0.01207, audio_tagging_loss=0.008624, over 3050828.83 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:17:03,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3653920.0, ans=0.1 2023-11-28 20:17:15,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.929e+01 9.464e+01 1.026e+02 1.498e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 20:17:16,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-28 20:17:31,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3654053.3333333335, ans=0.5 2023-11-28 20:17:53,498 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7050, loss[loss=0.05947, simple_loss=0.06666, pruned_loss=0.01484, audio_tagging_loss=0.0113, over 15554.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0895, pruned_loss=0.012, audio_tagging_loss=0.008717, over 3059140.23 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:18:09,437 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:18:18,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-28 20:18:23,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3654320.0, ans=0.125 2023-11-28 20:18:55,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3654520.0, ans=0.125 2023-11-28 20:18:56,450 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7100, loss[loss=0.0692, simple_loss=0.09845, pruned_loss=0.01271, audio_tagging_loss=0.007259, over 14970.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08948, pruned_loss=0.01204, audio_tagging_loss=0.008735, over 3055743.72 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:19:11,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3654586.6666666665, ans=0.025 2023-11-28 20:19:17,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3654586.6666666665, ans=0.09899494936611666 2023-11-28 20:19:17,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-11-28 20:19:19,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 9.027e+01 9.544e+01 1.031e+02 1.344e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:19:20,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-28 20:19:21,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2023-11-28 20:19:22,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654653.3333333335, ans=0.1 2023-11-28 20:19:24,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3654653.3333333335, ans=0.0 2023-11-28 20:19:25,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3654653.3333333335, ans=0.125 2023-11-28 20:19:26,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3654653.3333333335, ans=0.0 2023-11-28 20:19:28,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3654653.3333333335, ans=0.125 2023-11-28 20:19:29,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2023-11-28 20:19:42,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3654720.0, ans=0.125 2023-11-28 20:19:43,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3654720.0, ans=0.1 2023-11-28 20:19:47,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3654786.6666666665, ans=10.0 2023-11-28 20:19:58,792 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7150, loss[loss=0.06238, simple_loss=0.08687, pruned_loss=0.009161, audio_tagging_loss=0.009784, over 15533.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08948, pruned_loss=0.01194, audio_tagging_loss=0.008822, over 3055691.14 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:20:23,164 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-28 20:20:37,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3655053.3333333335, ans=0.2 2023-11-28 20:20:59,404 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7200, loss[loss=0.0541, simple_loss=0.06369, pruned_loss=0.008745, audio_tagging_loss=0.0135, over 14248.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08854, pruned_loss=0.01192, audio_tagging_loss=0.008902, over 3058231.10 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:21:13,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3655253.3333333335, ans=0.0 2023-11-28 20:21:24,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.059e+01 9.674e+01 1.051e+02 1.523e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:21:24,428 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-28 20:21:28,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3655320.0, ans=0.95 2023-11-28 20:21:49,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3655453.3333333335, ans=0.0 2023-11-28 20:21:50,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2023-11-28 20:21:51,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3655453.3333333335, ans=0.1 2023-11-28 20:22:01,295 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7250, loss[loss=0.08813, simple_loss=0.1256, pruned_loss=0.01924, audio_tagging_loss=0.00611, over 13982.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08876, pruned_loss=0.01194, audio_tagging_loss=0.008905, over 3050227.20 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:22:06,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3655520.0, ans=0.0 2023-11-28 20:22:26,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-28 20:22:42,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3655720.0, ans=0.0 2023-11-28 20:22:45,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3655720.0, ans=0.125 2023-11-28 20:22:45,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3655720.0, ans=0.0 2023-11-28 20:22:55,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=12.0 2023-11-28 20:23:03,244 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7300, loss[loss=0.06808, simple_loss=0.09491, pruned_loss=0.01035, audio_tagging_loss=0.01027, over 16704.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08941, pruned_loss=0.01202, audio_tagging_loss=0.0088, over 3046031.26 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:23:12,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2023-11-28 20:23:27,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.815e+01 9.477e+01 1.035e+02 1.367e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 20:23:27,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-28 20:23:34,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3655986.6666666665, ans=0.0 2023-11-28 20:23:34,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3655986.6666666665, ans=0.125 2023-11-28 20:23:43,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-28 20:23:44,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3656053.3333333335, ans=0.125 2023-11-28 20:24:04,866 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7350, loss[loss=0.05641, simple_loss=0.08254, pruned_loss=0.007568, audio_tagging_loss=0.007575, over 16049.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08859, pruned_loss=0.01181, audio_tagging_loss=0.008758, over 3043031.48 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:24:14,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=22.5 2023-11-28 20:24:21,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3656253.3333333335, ans=0.125 2023-11-28 20:24:24,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3656253.3333333335, ans=0.0 2023-11-28 20:24:29,511 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-28 20:24:30,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3656320.0, ans=0.125 2023-11-28 20:25:06,607 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7400, loss[loss=0.0538, simple_loss=0.06805, pruned_loss=0.0106, audio_tagging_loss=0.009171, over 14576.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08815, pruned_loss=0.01193, audio_tagging_loss=0.008672, over 3036891.27 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:25:30,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3656653.3333333335, ans=0.0 2023-11-28 20:25:30,863 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.862e+01 9.547e+01 1.020e+02 1.427e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:25:30,982 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-28 20:25:51,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3656720.0, ans=0.125 2023-11-28 20:26:06,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3656853.3333333335, ans=0.125 2023-11-28 20:26:07,290 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7450, loss[loss=0.0596, simple_loss=0.0845, pruned_loss=0.008493, audio_tagging_loss=0.008852, over 14606.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0885, pruned_loss=0.01207, audio_tagging_loss=0.008555, over 3032833.31 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:26:18,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3656853.3333333335, ans=0.0 2023-11-28 20:26:19,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3656920.0, ans=0.125 2023-11-28 20:26:32,752 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-28 20:26:40,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=22.5 2023-11-28 20:26:44,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3657053.3333333335, ans=0.1 2023-11-28 20:26:56,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3657120.0, ans=0.125 2023-11-28 20:26:58,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-28 20:27:00,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3657120.0, ans=0.5 2023-11-28 20:27:01,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3657120.0, ans=0.015 2023-11-28 20:27:05,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3657120.0, ans=0.07 2023-11-28 20:27:09,851 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7500, loss[loss=0.05743, simple_loss=0.07636, pruned_loss=0.008618, audio_tagging_loss=0.01063, over 15883.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08829, pruned_loss=0.01196, audio_tagging_loss=0.008544, over 3038122.25 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:27:11,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-28 20:27:12,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3657186.6666666665, ans=0.125 2023-11-28 20:27:16,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-28 20:27:17,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3657186.6666666665, ans=0.125 2023-11-28 20:27:19,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3657186.6666666665, ans=0.0 2023-11-28 20:27:24,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3657253.3333333335, ans=0.0 2023-11-28 20:27:28,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.44 vs. limit=6.0 2023-11-28 20:27:34,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.864e+01 9.552e+01 1.022e+02 1.615e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:27:34,231 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-28 20:27:47,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3657386.6666666665, ans=0.125 2023-11-28 20:28:12,513 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7550, loss[loss=0.05818, simple_loss=0.07948, pruned_loss=0.01192, audio_tagging_loss=0.00652, over 14377.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08835, pruned_loss=0.01198, audio_tagging_loss=0.008527, over 3039095.18 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:28:16,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3657520.0, ans=0.025 2023-11-28 20:28:30,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3657586.6666666665, ans=0.125 2023-11-28 20:28:37,414 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-28 20:28:40,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3657653.3333333335, ans=0.125 2023-11-28 20:28:49,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3657720.0, ans=0.125 2023-11-28 20:28:58,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-11-28 20:29:13,266 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7600, loss[loss=0.06204, simple_loss=0.07971, pruned_loss=0.01157, audio_tagging_loss=0.01061, over 13791.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08739, pruned_loss=0.0118, audio_tagging_loss=0.008505, over 3044076.14 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:29:13,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3657853.3333333335, ans=0.0 2023-11-28 20:29:38,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.036e+01 9.725e+01 1.055e+02 1.335e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 20:29:39,050 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-28 20:29:47,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3657986.6666666665, ans=0.125 2023-11-28 20:29:50,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-28 20:29:57,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-28 20:30:09,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658120.0, ans=0.1 2023-11-28 20:30:15,864 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7650, loss[loss=0.0618, simple_loss=0.08834, pruned_loss=0.00988, audio_tagging_loss=0.007752, over 15859.00 frames. ], tot_loss[loss=0.064, simple_loss=0.0875, pruned_loss=0.01183, audio_tagging_loss=0.008426, over 3039205.15 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:30:40,703 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-28 20:31:08,140 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:31:17,766 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7700, loss[loss=0.08618, simple_loss=0.1128, pruned_loss=0.02042, audio_tagging_loss=0.00936, over 15026.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08882, pruned_loss=0.01203, audio_tagging_loss=0.008362, over 3037127.39 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:31:30,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3658586.6666666665, ans=0.125 2023-11-28 20:31:42,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-28 20:31:43,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.090e+01 9.039e+01 9.612e+01 1.044e+02 1.351e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:31:52,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3658653.3333333335, ans=0.0 2023-11-28 20:32:04,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=22.5 2023-11-28 20:32:19,657 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7750, loss[loss=0.05965, simple_loss=0.07613, pruned_loss=0.01197, audio_tagging_loss=0.009612, over 16787.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08858, pruned_loss=0.01188, audio_tagging_loss=0.008467, over 3034444.27 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:32:22,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3658853.3333333335, ans=0.0 2023-11-28 20:32:44,860 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-28 20:32:49,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3658986.6666666665, ans=0.5 2023-11-28 20:33:22,036 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7800, loss[loss=0.07707, simple_loss=0.1058, pruned_loss=0.01589, audio_tagging_loss=0.008262, over 14919.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08974, pruned_loss=0.01208, audio_tagging_loss=0.008564, over 3038886.58 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:33:43,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3659253.3333333335, ans=0.1 2023-11-28 20:33:47,432 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-28 20:33:48,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.914e+01 9.757e+01 1.064e+02 1.306e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-28 20:33:55,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3659320.0, ans=0.0 2023-11-28 20:34:10,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3659453.3333333335, ans=0.125 2023-11-28 20:34:14,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3659453.3333333335, ans=0.125 2023-11-28 20:34:19,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3659453.3333333335, ans=0.0 2023-11-28 20:34:24,316 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7850, loss[loss=0.0689, simple_loss=0.09712, pruned_loss=0.0127, audio_tagging_loss=0.007637, over 16390.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08905, pruned_loss=0.01201, audio_tagging_loss=0.008599, over 3036171.36 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:34:41,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3659586.6666666665, ans=0.125 2023-11-28 20:34:49,045 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-28 20:34:59,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2023-11-28 20:35:22,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3659786.6666666665, ans=0.125 2023-11-28 20:35:25,335 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7900, loss[loss=0.08969, simple_loss=0.1358, pruned_loss=0.01511, audio_tagging_loss=0.006665, over 16621.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08887, pruned_loss=0.01209, audio_tagging_loss=0.008666, over 3043978.12 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:35:31,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2023-11-28 20:35:35,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3659853.3333333335, ans=0.2 2023-11-28 20:35:49,649 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-28 20:35:50,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 9.083e+01 9.481e+01 1.023e+02 1.467e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 20:36:15,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-11-28 20:36:26,836 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7950, loss[loss=0.07193, simple_loss=0.094, pruned_loss=0.01493, audio_tagging_loss=0.01, over 15059.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0895, pruned_loss=0.01221, audio_tagging_loss=0.00873, over 3045195.15 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:36:37,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3660253.3333333335, ans=0.125 2023-11-28 20:36:43,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3660253.3333333335, ans=0.125 2023-11-28 20:36:44,694 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:36:47,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3660253.3333333335, ans=0.0 2023-11-28 20:36:48,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3660253.3333333335, ans=0.125 2023-11-28 20:36:52,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-28 20:36:58,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3660320.0, ans=0.0 2023-11-28 20:37:15,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-28 20:37:18,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3660453.3333333335, ans=0.125 2023-11-28 20:37:19,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-28 20:37:20,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660453.3333333335, ans=0.1 2023-11-28 20:37:21,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3660453.3333333335, ans=0.0 2023-11-28 20:37:28,995 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8000, loss[loss=0.04465, simple_loss=0.05231, pruned_loss=0.007473, audio_tagging_loss=0.01103, over 15110.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0895, pruned_loss=0.01217, audio_tagging_loss=0.008794, over 3044433.48 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:37:53,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-28 20:37:54,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.902e+01 9.396e+01 1.018e+02 1.315e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 20:38:09,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3660720.0, ans=0.2 2023-11-28 20:38:21,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3660786.6666666665, ans=0.125 2023-11-28 20:38:26,666 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:38:31,165 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8050, loss[loss=0.0677, simple_loss=0.09299, pruned_loss=0.01213, audio_tagging_loss=0.009069, over 15080.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08883, pruned_loss=0.01217, audio_tagging_loss=0.008869, over 3047748.58 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:38:34,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2023-11-28 20:38:50,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3660920.0, ans=0.125 2023-11-28 20:38:55,141 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-28 20:39:04,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3660986.6666666665, ans=0.04949747468305833 2023-11-28 20:39:04,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3660986.6666666665, ans=0.125 2023-11-28 20:39:05,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3660986.6666666665, ans=0.2 2023-11-28 20:39:32,423 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8100, loss[loss=0.06546, simple_loss=0.08579, pruned_loss=0.01268, audio_tagging_loss=0.009885, over 15617.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08776, pruned_loss=0.01188, audio_tagging_loss=0.008802, over 3047755.61 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:39:37,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-28 20:39:43,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3661253.3333333335, ans=15.0 2023-11-28 20:39:51,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3661253.3333333335, ans=0.0 2023-11-28 20:39:56,912 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-28 20:40:00,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 9.129e+01 9.751e+01 1.053e+02 1.304e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 20:40:31,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3661453.3333333335, ans=0.125 2023-11-28 20:40:34,202 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8150, loss[loss=0.07512, simple_loss=0.09952, pruned_loss=0.01735, audio_tagging_loss=0.008008, over 14827.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08798, pruned_loss=0.01198, audio_tagging_loss=0.008728, over 3046342.81 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:40:38,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3661520.0, ans=0.0 2023-11-28 20:40:58,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-28 20:41:00,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3661653.3333333335, ans=0.5 2023-11-28 20:41:23,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3661786.6666666665, ans=0.1 2023-11-28 20:41:28,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3661786.6666666665, ans=0.2 2023-11-28 20:41:35,431 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8200, loss[loss=0.06037, simple_loss=0.0891, pruned_loss=0.007198, audio_tagging_loss=0.008626, over 15035.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08806, pruned_loss=0.01191, audio_tagging_loss=0.008671, over 3043804.29 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:41:36,741 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:41:43,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3661853.3333333335, ans=0.125 2023-11-28 20:41:59,781 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-28 20:42:02,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.719e+01 9.487e+01 1.033e+02 1.453e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 20:42:08,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-28 20:42:35,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3662186.6666666665, ans=0.2 2023-11-28 20:42:36,664 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8250, loss[loss=0.06934, simple_loss=0.09705, pruned_loss=0.01395, audio_tagging_loss=0.00686, over 15351.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0896, pruned_loss=0.01206, audio_tagging_loss=0.008424, over 3045920.07 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:42:42,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662186.6666666665, ans=0.1 2023-11-28 20:42:55,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3662253.3333333335, ans=0.1 2023-11-28 20:43:00,544 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-28 20:43:37,402 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8300, loss[loss=0.06182, simple_loss=0.08652, pruned_loss=0.0101, audio_tagging_loss=0.008463, over 15137.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08991, pruned_loss=0.0121, audio_tagging_loss=0.008433, over 3048907.22 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:43:41,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3662520.0, ans=10.0 2023-11-28 20:43:42,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3662520.0, ans=0.0 2023-11-28 20:43:46,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3662520.0, ans=0.07 2023-11-28 20:43:51,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3662586.6666666665, ans=0.05 2023-11-28 20:43:52,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3662586.6666666665, ans=0.0 2023-11-28 20:44:02,445 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-28 20:44:05,011 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.200e+01 9.674e+01 1.037e+02 1.231e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:44:07,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2023-11-28 20:44:11,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3662653.3333333335, ans=0.125 2023-11-28 20:44:14,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3662720.0, ans=0.0 2023-11-28 20:44:24,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3662720.0, ans=0.2 2023-11-28 20:44:28,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3662786.6666666665, ans=0.0 2023-11-28 20:44:39,619 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8350, loss[loss=0.0706, simple_loss=0.1007, pruned_loss=0.01214, audio_tagging_loss=0.008092, over 15850.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08963, pruned_loss=0.01198, audio_tagging_loss=0.008428, over 3043277.53 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:44:53,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3662920.0, ans=0.0 2023-11-28 20:45:04,349 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-28 20:45:40,902 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8400, loss[loss=0.05793, simple_loss=0.07694, pruned_loss=0.01109, audio_tagging_loss=0.008361, over 15943.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0895, pruned_loss=0.012, audio_tagging_loss=0.008416, over 3047043.41 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:46:05,474 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-28 20:46:07,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.725e+01 9.379e+01 9.933e+01 1.253e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 20:46:15,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3663320.0, ans=0.125 2023-11-28 20:46:21,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3663386.6666666665, ans=0.2 2023-11-28 20:46:26,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2023-11-28 20:46:40,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3663453.3333333335, ans=0.125 2023-11-28 20:46:42,551 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8450, loss[loss=0.05887, simple_loss=0.07656, pruned_loss=0.0128, audio_tagging_loss=0.007796, over 16140.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0888, pruned_loss=0.01173, audio_tagging_loss=0.008419, over 3050433.04 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:47:07,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-28 20:47:21,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2023-11-28 20:47:26,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-28 20:47:27,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3663720.0, ans=0.125 2023-11-28 20:47:42,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3663786.6666666665, ans=0.2 2023-11-28 20:47:44,287 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8500, loss[loss=0.05194, simple_loss=0.07356, pruned_loss=0.007399, audio_tagging_loss=0.007764, over 16332.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08938, pruned_loss=0.0118, audio_tagging_loss=0.00842, over 3053679.91 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:48:02,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3663920.0, ans=0.125 2023-11-28 20:48:05,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3663920.0, ans=0.1 2023-11-28 20:48:09,411 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-28 20:48:11,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.752e+01 9.470e+01 1.030e+02 1.283e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 20:48:22,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2023-11-28 20:48:30,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2023-11-28 20:48:46,224 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8550, loss[loss=0.05179, simple_loss=0.06409, pruned_loss=0.009021, audio_tagging_loss=0.01072, over 16376.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08946, pruned_loss=0.0119, audio_tagging_loss=0.008491, over 3058870.00 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:48:58,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3664253.3333333335, ans=0.0 2023-11-28 20:49:02,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3664253.3333333335, ans=0.1 2023-11-28 20:49:10,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-28 20:49:11,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3664320.0, ans=0.2 2023-11-28 20:49:13,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3664320.0, ans=0.07 2023-11-28 20:49:13,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3664320.0, ans=0.0 2023-11-28 20:49:18,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3664320.0, ans=0.125 2023-11-28 20:49:35,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3664453.3333333335, ans=0.0 2023-11-28 20:49:47,958 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8600, loss[loss=0.06767, simple_loss=0.09847, pruned_loss=0.01108, audio_tagging_loss=0.007349, over 14836.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08972, pruned_loss=0.01173, audio_tagging_loss=0.008581, over 3065580.38 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:49:49,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3664520.0, ans=0.1 2023-11-28 20:50:11,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3664653.3333333335, ans=0.05 2023-11-28 20:50:12,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-28 20:50:12,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3664653.3333333335, ans=0.0 2023-11-28 20:50:16,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.961e+01 9.460e+01 1.024e+02 1.421e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 20:50:25,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3664720.0, ans=0.125 2023-11-28 20:50:26,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2023-11-28 20:50:29,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3664720.0, ans=0.2 2023-11-28 20:50:45,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3664786.6666666665, ans=0.125 2023-11-28 20:50:49,968 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8650, loss[loss=0.07664, simple_loss=0.1036, pruned_loss=0.01608, audio_tagging_loss=0.008748, over 15973.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08947, pruned_loss=0.01182, audio_tagging_loss=0.008603, over 3062460.28 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:50:54,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3664853.3333333335, ans=0.0 2023-11-28 20:50:58,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-11-28 20:50:59,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3664853.3333333335, ans=0.2 2023-11-28 20:50:59,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3664853.3333333335, ans=0.125 2023-11-28 20:51:01,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3664920.0, ans=0.0 2023-11-28 20:51:03,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3664920.0, ans=0.125 2023-11-28 20:51:15,607 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-28 20:51:16,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3664986.6666666665, ans=0.0 2023-11-28 20:51:19,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3664986.6666666665, ans=0.0 2023-11-28 20:51:23,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.91 vs. limit=10.0 2023-11-28 20:51:33,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3665053.3333333335, ans=0.0 2023-11-28 20:51:38,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3665120.0, ans=0.0 2023-11-28 20:51:40,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3665120.0, ans=0.125 2023-11-28 20:51:51,515 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8700, loss[loss=0.05938, simple_loss=0.0778, pruned_loss=0.009779, audio_tagging_loss=0.0107, over 14064.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09085, pruned_loss=0.01202, audio_tagging_loss=0.008629, over 3058540.80 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:51:55,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3665186.6666666665, ans=0.125 2023-11-28 20:52:01,430 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:52:12,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3665253.3333333335, ans=0.2 2023-11-28 20:52:16,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-28 20:52:20,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.859e+01 9.712e+01 1.046e+02 1.344e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 20:52:23,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3665320.0, ans=0.125 2023-11-28 20:52:24,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-28 20:52:31,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3665386.6666666665, ans=0.125 2023-11-28 20:52:46,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3665453.3333333335, ans=0.125 2023-11-28 20:52:49,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3665453.3333333335, ans=0.1 2023-11-28 20:52:53,846 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8750, loss[loss=0.06487, simple_loss=0.08142, pruned_loss=0.01396, audio_tagging_loss=0.01019, over 14920.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09076, pruned_loss=0.01207, audio_tagging_loss=0.00875, over 3051511.22 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:53:06,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3665586.6666666665, ans=0.125 2023-11-28 20:53:12,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3665586.6666666665, ans=0.125 2023-11-28 20:53:18,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-28 20:53:18,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-28 20:53:53,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3665786.6666666665, ans=0.2 2023-11-28 20:53:55,649 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8800, loss[loss=0.09254, simple_loss=0.1341, pruned_loss=0.02014, audio_tagging_loss=0.005338, over 16152.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09129, pruned_loss=0.01219, audio_tagging_loss=0.008855, over 3055664.25 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:54:17,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-28 20:54:19,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-28 20:54:23,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.083e+01 9.097e+01 9.649e+01 1.028e+02 1.198e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 20:54:52,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3666120.0, ans=0.125 2023-11-28 20:54:56,759 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8850, loss[loss=0.05686, simple_loss=0.06723, pruned_loss=0.01381, audio_tagging_loss=0.009434, over 15272.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09051, pruned_loss=0.01223, audio_tagging_loss=0.008906, over 3048942.05 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:54:58,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3666186.6666666665, ans=0.125 2023-11-28 20:55:09,121 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:55:09,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3666253.3333333335, ans=0.125 2023-11-28 20:55:10,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3666253.3333333335, ans=0.0 2023-11-28 20:55:14,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3666253.3333333335, ans=0.09899494936611666 2023-11-28 20:55:21,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-28 20:55:35,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3666386.6666666665, ans=0.0 2023-11-28 20:55:40,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3666386.6666666665, ans=0.0 2023-11-28 20:55:58,453 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8900, loss[loss=0.05984, simple_loss=0.08457, pruned_loss=0.008812, audio_tagging_loss=0.008743, over 14807.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09128, pruned_loss=0.01232, audio_tagging_loss=0.008761, over 3045303.63 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:56:23,512 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-28 20:56:23,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-28 20:56:29,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.959e+01 9.608e+01 1.039e+02 1.784e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:56:29,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-28 20:56:36,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3666720.0, ans=0.0 2023-11-28 20:56:43,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2023-11-28 20:56:44,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3666720.0, ans=0.125 2023-11-28 20:56:47,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3666786.6666666665, ans=0.1 2023-11-28 20:56:47,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3666786.6666666665, ans=0.1 2023-11-28 20:56:50,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666786.6666666665, ans=0.1 2023-11-28 20:56:50,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3666786.6666666665, ans=0.125 2023-11-28 20:57:00,005 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8950, loss[loss=0.07266, simple_loss=0.1096, pruned_loss=0.01126, audio_tagging_loss=0.006621, over 15779.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09034, pruned_loss=0.01204, audio_tagging_loss=0.008624, over 3048024.48 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:57:15,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3666920.0, ans=0.2 2023-11-28 20:57:23,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3666986.6666666665, ans=0.2 2023-11-28 20:57:24,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-28 20:57:27,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3666986.6666666665, ans=0.125 2023-11-28 20:57:40,619 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:58:02,917 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9000, loss[loss=0.06468, simple_loss=0.08679, pruned_loss=0.01252, audio_tagging_loss=0.008763, over 13846.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0905, pruned_loss=0.01204, audio_tagging_loss=0.008479, over 3055147.37 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:58:02,918 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 20:58:42,562 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05897, simple_loss=0.05047, pruned_loss=0.005253, audio_tagging_loss=0.02848, over 4681554.00 frames. 2023-11-28 20:58:42,562 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 20:58:54,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-28 20:58:59,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-28 20:59:02,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3667253.3333333335, ans=0.0 2023-11-28 20:59:07,580 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-28 20:59:13,414 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.853e+01 9.549e+01 1.044e+02 1.258e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:59:16,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-28 20:59:24,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3667386.6666666665, ans=0.125 2023-11-28 20:59:24,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3667386.6666666665, ans=0.5 2023-11-28 20:59:26,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3667386.6666666665, ans=0.2 2023-11-28 20:59:27,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.47 vs. limit=15.0 2023-11-28 20:59:33,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3667453.3333333335, ans=0.0 2023-11-28 20:59:44,789 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9050, loss[loss=0.06617, simple_loss=0.1035, pruned_loss=0.007193, audio_tagging_loss=0.007231, over 15271.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09068, pruned_loss=0.01199, audio_tagging_loss=0.008331, over 3056324.95 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:59:55,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3667520.0, ans=0.5 2023-11-28 21:00:07,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3667653.3333333335, ans=0.0 2023-11-28 21:00:08,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-28 21:00:11,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3667653.3333333335, ans=0.125 2023-11-28 21:00:25,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.66 vs. limit=10.0 2023-11-28 21:00:40,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-28 21:00:44,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.97 vs. limit=10.0 2023-11-28 21:00:46,671 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9100, loss[loss=0.09007, simple_loss=0.1287, pruned_loss=0.01951, audio_tagging_loss=0.006198, over 16417.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09072, pruned_loss=0.01213, audio_tagging_loss=0.008265, over 3054426.13 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:00:47,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-28 21:01:12,349 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-28 21:01:16,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3667986.6666666665, ans=0.0 2023-11-28 21:01:18,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.959e+01 9.673e+01 1.042e+02 1.442e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 21:01:23,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2023-11-28 21:01:27,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-28 21:01:48,500 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9150, loss[loss=0.06701, simple_loss=0.09458, pruned_loss=0.01102, audio_tagging_loss=0.0087, over 16552.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09103, pruned_loss=0.01222, audio_tagging_loss=0.008279, over 3053481.80 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:01:59,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3668253.3333333335, ans=0.125 2023-11-28 21:02:08,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-28 21:02:13,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-28 21:02:14,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3668320.0, ans=0.125 2023-11-28 21:02:21,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3668320.0, ans=0.125 2023-11-28 21:02:50,542 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9200, loss[loss=0.05012, simple_loss=0.06416, pruned_loss=0.007561, audio_tagging_loss=0.01048, over 15050.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.09038, pruned_loss=0.01202, audio_tagging_loss=0.008234, over 3050974.18 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:02:55,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3668520.0, ans=0.0 2023-11-28 21:02:57,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3668520.0, ans=0.125 2023-11-28 21:03:05,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3668586.6666666665, ans=0.1 2023-11-28 21:03:06,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3668586.6666666665, ans=0.0 2023-11-28 21:03:14,990 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-28 21:03:21,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.689e+01 9.468e+01 1.009e+02 1.302e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 21:03:39,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-28 21:03:40,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.50 vs. limit=10.0 2023-11-28 21:03:42,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3668786.6666666665, ans=0.0 2023-11-28 21:03:44,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3668786.6666666665, ans=0.0 2023-11-28 21:03:50,679 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:03:52,638 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9250, loss[loss=0.06136, simple_loss=0.09083, pruned_loss=0.009386, audio_tagging_loss=0.006555, over 15421.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08949, pruned_loss=0.01195, audio_tagging_loss=0.008339, over 3048970.95 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:04:12,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3668920.0, ans=0.07 2023-11-28 21:04:17,028 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-28 21:04:41,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3669120.0, ans=0.05 2023-11-28 21:04:54,340 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9300, loss[loss=0.06271, simple_loss=0.08768, pruned_loss=0.01054, audio_tagging_loss=0.008324, over 15354.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08951, pruned_loss=0.01188, audio_tagging_loss=0.008369, over 3056875.33 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:05:12,030 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:05:13,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3669253.3333333335, ans=0.2 2023-11-28 21:05:19,433 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-28 21:05:22,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3669320.0, ans=0.1 2023-11-28 21:05:26,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.959e+01 9.529e+01 1.028e+02 1.391e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 21:05:55,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3669520.0, ans=0.0 2023-11-28 21:05:56,367 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9350, loss[loss=0.0368, simple_loss=0.04922, pruned_loss=0.003662, audio_tagging_loss=0.008527, over 15135.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08979, pruned_loss=0.01189, audio_tagging_loss=0.008404, over 3057430.04 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:06:09,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3669586.6666666665, ans=0.125 2023-11-28 21:06:16,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-28 21:06:19,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3669653.3333333335, ans=0.2 2023-11-28 21:06:20,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-28 21:06:26,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2023-11-28 21:06:52,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3669786.6666666665, ans=0.0 2023-11-28 21:06:58,202 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9400, loss[loss=0.06865, simple_loss=0.0938, pruned_loss=0.01519, audio_tagging_loss=0.006553, over 15628.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08888, pruned_loss=0.0118, audio_tagging_loss=0.008586, over 3046871.41 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:07:16,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3669920.0, ans=0.2 2023-11-28 21:07:17,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3669920.0, ans=0.07 2023-11-28 21:07:22,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-28 21:07:25,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-28 21:07:28,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=3669986.6666666665, ans=12.0 2023-11-28 21:07:29,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3669986.6666666665, ans=0.125 2023-11-28 21:07:30,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.168e+01 9.669e+01 1.024e+02 1.175e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 21:07:44,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3670053.3333333335, ans=0.2 2023-11-28 21:07:56,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2023-11-28 21:07:58,066 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:07:59,924 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9450, loss[loss=0.05353, simple_loss=0.06399, pruned_loss=0.01134, audio_tagging_loss=0.0102, over 15335.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08847, pruned_loss=0.01171, audio_tagging_loss=0.008693, over 3051336.52 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:08:03,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3670186.6666666665, ans=0.125 2023-11-28 21:08:08,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3670186.6666666665, ans=0.0 2023-11-28 21:08:13,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-11-28 21:08:24,097 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-28 21:08:30,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-28 21:08:33,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-28 21:08:40,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3670386.6666666665, ans=0.0 2023-11-28 21:08:49,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3670453.3333333335, ans=0.125 2023-11-28 21:09:01,332 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9500, loss[loss=0.09339, simple_loss=0.1248, pruned_loss=0.02428, audio_tagging_loss=0.006694, over 15331.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0887, pruned_loss=0.01178, audio_tagging_loss=0.008746, over 3045079.95 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:09:24,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3670653.3333333335, ans=0.125 2023-11-28 21:09:25,506 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-28 21:09:27,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-11-28 21:09:33,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 9.097e+01 9.748e+01 1.049e+02 1.377e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 21:09:40,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3670720.0, ans=0.05 2023-11-28 21:09:43,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3670720.0, ans=0.125 2023-11-28 21:09:43,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3670720.0, ans=0.125 2023-11-28 21:09:49,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3670786.6666666665, ans=0.125 2023-11-28 21:10:03,508 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9550, loss[loss=0.07553, simple_loss=0.09709, pruned_loss=0.01702, audio_tagging_loss=0.009966, over 15172.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08833, pruned_loss=0.01186, audio_tagging_loss=0.008881, over 3047551.41 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:10:06,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3670853.3333333335, ans=0.1 2023-11-28 21:10:27,517 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-28 21:10:59,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=22.5 2023-11-28 21:11:04,570 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9600, loss[loss=0.08295, simple_loss=0.1162, pruned_loss=0.01833, audio_tagging_loss=0.006505, over 16161.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08811, pruned_loss=0.01173, audio_tagging_loss=0.008862, over 3046270.39 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:11:29,110 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-28 21:11:36,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.932e+01 9.584e+01 1.005e+02 1.481e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:11:58,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3671453.3333333335, ans=0.1 2023-11-28 21:11:59,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3671453.3333333335, ans=0.125 2023-11-28 21:12:00,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3671453.3333333335, ans=0.0 2023-11-28 21:12:03,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-28 21:12:04,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671453.3333333335, ans=0.1 2023-11-28 21:12:06,284 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9650, loss[loss=0.06329, simple_loss=0.07756, pruned_loss=0.01491, audio_tagging_loss=0.009597, over 15233.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08787, pruned_loss=0.01176, audio_tagging_loss=0.008831, over 3039356.82 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:12:31,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-28 21:12:42,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-28 21:12:57,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3671786.6666666665, ans=0.0 2023-11-28 21:13:07,730 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9700, loss[loss=0.07355, simple_loss=0.1016, pruned_loss=0.01443, audio_tagging_loss=0.008329, over 15921.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08905, pruned_loss=0.01195, audio_tagging_loss=0.008674, over 3043150.71 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:13:23,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3671920.0, ans=0.125 2023-11-28 21:13:32,973 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-28 21:13:34,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2023-11-28 21:13:40,230 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.004e+01 9.586e+01 1.046e+02 1.916e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:13:49,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3672053.3333333335, ans=0.125 2023-11-28 21:14:10,537 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9750, loss[loss=0.07151, simple_loss=0.1031, pruned_loss=0.01318, audio_tagging_loss=0.006766, over 15177.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08928, pruned_loss=0.01198, audio_tagging_loss=0.008559, over 3042423.81 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:14:14,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3672186.6666666665, ans=0.125 2023-11-28 21:14:20,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3672186.6666666665, ans=0.125 2023-11-28 21:14:35,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-28 21:15:07,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3672453.3333333335, ans=0.125 2023-11-28 21:15:11,887 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9800, loss[loss=0.0558, simple_loss=0.07287, pruned_loss=0.01139, audio_tagging_loss=0.007982, over 14255.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08895, pruned_loss=0.01198, audio_tagging_loss=0.008494, over 3039327.42 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:15:36,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-28 21:15:39,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3672653.3333333335, ans=0.0 2023-11-28 21:15:43,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.839e+01 9.561e+01 1.020e+02 1.364e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 21:15:51,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3672720.0, ans=0.02 2023-11-28 21:16:01,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3672786.6666666665, ans=0.125 2023-11-28 21:16:07,121 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:16:12,975 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9850, loss[loss=0.07729, simple_loss=0.1147, pruned_loss=0.01245, audio_tagging_loss=0.007479, over 15200.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09085, pruned_loss=0.01225, audio_tagging_loss=0.008352, over 3043360.77 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:16:13,257 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:16:17,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-11-28 21:16:26,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3672920.0, ans=0.95 2023-11-28 21:16:28,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-28 21:16:36,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3672920.0, ans=0.0 2023-11-28 21:16:38,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-28 21:16:46,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3672986.6666666665, ans=0.125 2023-11-28 21:17:06,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3673120.0, ans=0.1 2023-11-28 21:17:10,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3673120.0, ans=0.125 2023-11-28 21:17:14,284 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9900, loss[loss=0.07192, simple_loss=0.1039, pruned_loss=0.01289, audio_tagging_loss=0.007061, over 15322.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09106, pruned_loss=0.01219, audio_tagging_loss=0.008211, over 3047814.18 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:17:27,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3673253.3333333335, ans=0.125 2023-11-28 21:17:29,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3673253.3333333335, ans=0.125 2023-11-28 21:17:32,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3673253.3333333335, ans=0.2 2023-11-28 21:17:32,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2023-11-28 21:17:33,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3673253.3333333335, ans=0.0 2023-11-28 21:17:39,971 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-28 21:17:47,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2023-11-28 21:17:48,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.969e+01 9.517e+01 1.006e+02 1.259e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 21:17:48,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3673320.0, ans=0.5 2023-11-28 21:18:14,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3673453.3333333335, ans=0.0 2023-11-28 21:18:16,952 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9950, loss[loss=0.07293, simple_loss=0.1076, pruned_loss=0.01234, audio_tagging_loss=0.006774, over 15703.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.091, pruned_loss=0.01216, audio_tagging_loss=0.008266, over 3047678.06 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:18:38,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3673586.6666666665, ans=0.0 2023-11-28 21:18:41,971 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-28 21:18:43,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3673653.3333333335, ans=0.125 2023-11-28 21:19:18,497 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10000, loss[loss=0.05902, simple_loss=0.07786, pruned_loss=0.01247, audio_tagging_loss=0.007622, over 15049.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08991, pruned_loss=0.01192, audio_tagging_loss=0.008276, over 3043152.56 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:19:32,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3673920.0, ans=0.0 2023-11-28 21:19:34,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3673920.0, ans=0.2 2023-11-28 21:19:42,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-28 21:19:49,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-28 21:19:51,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 9.118e+01 9.727e+01 1.019e+02 1.264e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 21:20:02,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3674053.3333333335, ans=0.07 2023-11-28 21:20:03,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3674053.3333333335, ans=0.125 2023-11-28 21:20:07,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=22.5 2023-11-28 21:20:10,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3674120.0, ans=0.125 2023-11-28 21:20:13,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3674120.0, ans=0.1 2023-11-28 21:20:19,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3674186.6666666665, ans=0.0 2023-11-28 21:20:20,075 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10050, loss[loss=0.06545, simple_loss=0.07858, pruned_loss=0.01227, audio_tagging_loss=0.0139, over 13734.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08962, pruned_loss=0.01194, audio_tagging_loss=0.008371, over 3039115.84 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:20:22,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3674186.6666666665, ans=0.125 2023-11-28 21:20:27,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3674186.6666666665, ans=0.2 2023-11-28 21:20:40,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3674253.3333333335, ans=0.05 2023-11-28 21:20:46,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-28 21:21:10,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3674453.3333333335, ans=0.0 2023-11-28 21:21:22,603 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10100, loss[loss=0.06134, simple_loss=0.08952, pruned_loss=0.01042, audio_tagging_loss=0.006165, over 14779.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08996, pruned_loss=0.01197, audio_tagging_loss=0.008421, over 3040322.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:21:36,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3674586.6666666665, ans=0.125 2023-11-28 21:21:46,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-28 21:21:54,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3674653.3333333335, ans=0.1 2023-11-28 21:21:55,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.986e+01 9.697e+01 1.060e+02 1.407e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:21:59,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3674720.0, ans=0.1 2023-11-28 21:22:13,480 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:22:24,617 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10150, loss[loss=0.05062, simple_loss=0.06329, pruned_loss=0.008874, audio_tagging_loss=0.0101, over 14934.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08994, pruned_loss=0.01193, audio_tagging_loss=0.008527, over 3051027.17 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:22:32,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3674853.3333333335, ans=0.1 2023-11-28 21:22:42,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3674920.0, ans=0.2 2023-11-28 21:22:49,251 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-28 21:22:52,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3674986.6666666665, ans=0.0 2023-11-28 21:22:54,615 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:03,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3675053.3333333335, ans=0.0 2023-11-28 21:23:04,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3675053.3333333335, ans=0.125 2023-11-28 21:23:26,861 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10200, loss[loss=0.06833, simple_loss=0.09447, pruned_loss=0.01177, audio_tagging_loss=0.009322, over 15315.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08923, pruned_loss=0.0118, audio_tagging_loss=0.008703, over 3053908.76 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:23:49,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3675253.3333333335, ans=15.0 2023-11-28 21:23:50,120 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:50,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3675320.0, ans=0.125 2023-11-28 21:23:51,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-28 21:24:00,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.789e+01 9.521e+01 1.014e+02 1.270e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 21:24:00,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3675320.0, ans=0.125 2023-11-28 21:24:11,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3675386.6666666665, ans=0.125 2023-11-28 21:24:28,413 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10250, loss[loss=0.06809, simple_loss=0.09127, pruned_loss=0.01378, audio_tagging_loss=0.008667, over 15301.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08918, pruned_loss=0.01188, audio_tagging_loss=0.008811, over 3055203.84 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:24:28,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3675520.0, ans=0.0 2023-11-28 21:24:48,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3675586.6666666665, ans=0.125 2023-11-28 21:24:50,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3675586.6666666665, ans=0.0 2023-11-28 21:24:53,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-28 21:25:09,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3675720.0, ans=0.125 2023-11-28 21:25:14,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-28 21:25:15,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=22.5 2023-11-28 21:25:16,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3675786.6666666665, ans=0.2 2023-11-28 21:25:27,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-28 21:25:30,779 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10300, loss[loss=0.07246, simple_loss=0.09476, pruned_loss=0.01523, audio_tagging_loss=0.009846, over 15024.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08809, pruned_loss=0.01175, audio_tagging_loss=0.008833, over 3053132.39 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:25:52,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3675920.0, ans=0.125 2023-11-28 21:25:54,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-28 21:25:55,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-28 21:26:04,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 9.006e+01 9.446e+01 1.010e+02 1.376e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 21:26:16,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=22.5 2023-11-28 21:26:32,629 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10350, loss[loss=0.06966, simple_loss=0.1024, pruned_loss=0.008258, audio_tagging_loss=0.01021, over 15108.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08797, pruned_loss=0.01171, audio_tagging_loss=0.008992, over 3053309.83 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:26:56,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-28 21:26:58,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-28 21:26:58,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3676320.0, ans=10.0 2023-11-28 21:27:16,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3676386.6666666665, ans=0.05 2023-11-28 21:27:16,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-28 21:27:22,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=22.5 2023-11-28 21:27:33,567 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10400, loss[loss=0.06016, simple_loss=0.08263, pruned_loss=0.01041, audio_tagging_loss=0.00843, over 16208.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08805, pruned_loss=0.01171, audio_tagging_loss=0.008966, over 3054906.74 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:27:58,428 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-28 21:28:07,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.953e+01 9.462e+01 1.003e+02 1.279e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 21:28:18,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-28 21:28:20,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-28 21:28:22,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3676786.6666666665, ans=0.0 2023-11-28 21:28:32,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3676786.6666666665, ans=0.05 2023-11-28 21:28:35,110 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10450, loss[loss=0.06579, simple_loss=0.09194, pruned_loss=0.01496, audio_tagging_loss=0.004863, over 14083.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08801, pruned_loss=0.01173, audio_tagging_loss=0.00899, over 3051555.37 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:29:00,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-28 21:29:00,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-28 21:29:11,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-28 21:29:18,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3677053.3333333335, ans=0.125 2023-11-28 21:29:22,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3677053.3333333335, ans=0.0 2023-11-28 21:29:37,888 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10500, loss[loss=0.09557, simple_loss=0.1391, pruned_loss=0.02029, audio_tagging_loss=0.005753, over 16611.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08796, pruned_loss=0.01165, audio_tagging_loss=0.008861, over 3057311.11 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:29:38,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3677186.6666666665, ans=0.5 2023-11-28 21:29:52,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3677253.3333333335, ans=0.0 2023-11-28 21:29:57,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3677253.3333333335, ans=0.125 2023-11-28 21:30:02,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-28 21:30:11,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.780e+01 9.536e+01 1.016e+02 1.256e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 21:30:39,048 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10550, loss[loss=0.05974, simple_loss=0.08415, pruned_loss=0.01001, audio_tagging_loss=0.007654, over 15590.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.0892, pruned_loss=0.01187, audio_tagging_loss=0.008665, over 3056376.36 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:30:53,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3677586.6666666665, ans=0.125 2023-11-28 21:30:57,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677586.6666666665, ans=0.1 2023-11-28 21:31:04,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-28 21:31:18,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3677720.0, ans=0.5 2023-11-28 21:31:19,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3677720.0, ans=0.2 2023-11-28 21:31:20,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677720.0, ans=0.1 2023-11-28 21:31:20,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3677720.0, ans=0.07 2023-11-28 21:31:40,747 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10600, loss[loss=0.07121, simple_loss=0.1006, pruned_loss=0.01445, audio_tagging_loss=0.00646, over 15309.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08921, pruned_loss=0.01187, audio_tagging_loss=0.008621, over 3055375.39 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:05,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-28 21:32:14,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 9.049e+01 9.693e+01 1.043e+02 1.339e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:32:34,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3678120.0, ans=6.0 2023-11-28 21:32:34,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3678120.0, ans=0.015 2023-11-28 21:32:43,461 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10650, loss[loss=0.08334, simple_loss=0.1166, pruned_loss=0.0189, audio_tagging_loss=0.006157, over 14043.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08911, pruned_loss=0.01197, audio_tagging_loss=0.008643, over 3052915.08 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:43,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3678186.6666666665, ans=0.125 2023-11-28 21:32:51,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.49 vs. limit=22.5 2023-11-28 21:32:54,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3678253.3333333335, ans=0.1 2023-11-28 21:33:00,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3678253.3333333335, ans=0.125 2023-11-28 21:33:08,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-28 21:33:32,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2023-11-28 21:33:39,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3678453.3333333335, ans=0.0 2023-11-28 21:33:45,306 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10700, loss[loss=0.06818, simple_loss=0.09637, pruned_loss=0.01287, audio_tagging_loss=0.007133, over 15340.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09092, pruned_loss=0.01229, audio_tagging_loss=0.008569, over 3049272.94 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:34:07,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3678586.6666666665, ans=0.1 2023-11-28 21:34:07,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2023-11-28 21:34:10,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-28 21:34:16,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3678653.3333333335, ans=0.125 2023-11-28 21:34:19,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.110e+01 9.632e+01 1.031e+02 2.472e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-28 21:34:20,721 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:34:42,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3678786.6666666665, ans=0.125 2023-11-28 21:34:48,474 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10750, loss[loss=0.07065, simple_loss=0.09914, pruned_loss=0.01384, audio_tagging_loss=0.007234, over 16395.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09079, pruned_loss=0.01237, audio_tagging_loss=0.00858, over 3049100.13 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:34:56,129 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:35:13,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-28 21:35:13,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3678986.6666666665, ans=0.125 2023-11-28 21:35:21,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3678986.6666666665, ans=0.2 2023-11-28 21:35:24,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3679053.3333333335, ans=0.125 2023-11-28 21:35:40,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3679120.0, ans=0.1 2023-11-28 21:35:48,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3679186.6666666665, ans=0.1 2023-11-28 21:35:49,811 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10800, loss[loss=0.07232, simple_loss=0.09704, pruned_loss=0.01486, audio_tagging_loss=0.008943, over 15354.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09106, pruned_loss=0.01239, audio_tagging_loss=0.008395, over 3048396.17 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:35:52,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3679186.6666666665, ans=0.0 2023-11-28 21:35:58,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3679186.6666666665, ans=0.2 2023-11-28 21:36:15,179 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-28 21:36:18,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3679320.0, ans=0.0 2023-11-28 21:36:21,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3679320.0, ans=0.125 2023-11-28 21:36:24,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.859e+01 9.432e+01 1.041e+02 1.593e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:36:38,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2023-11-28 21:36:47,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3679453.3333333335, ans=0.0 2023-11-28 21:36:50,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3679520.0, ans=0.04949747468305833 2023-11-28 21:36:51,863 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10850, loss[loss=0.06112, simple_loss=0.08715, pruned_loss=0.008978, audio_tagging_loss=0.008567, over 15325.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09043, pruned_loss=0.01207, audio_tagging_loss=0.008491, over 3052550.43 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:36:57,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-28 21:37:14,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3679586.6666666665, ans=0.09899494936611666 2023-11-28 21:37:16,565 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-28 21:37:35,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3679720.0, ans=0.0 2023-11-28 21:37:50,017 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:37:53,410 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10900, loss[loss=0.04169, simple_loss=0.05319, pruned_loss=0.003967, audio_tagging_loss=0.01112, over 16721.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08992, pruned_loss=0.0121, audio_tagging_loss=0.00858, over 3052174.39 frames. ], batch size: 65, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:37:53,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3679853.3333333335, ans=0.0 2023-11-28 21:38:08,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3679920.0, ans=0.0 2023-11-28 21:38:18,008 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-28 21:38:26,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3679986.6666666665, ans=0.0 2023-11-28 21:38:30,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 9.027e+01 9.612e+01 1.023e+02 1.534e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 21:38:36,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680053.3333333335, ans=0.1 2023-11-28 21:38:40,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3680053.3333333335, ans=10.0 2023-11-28 21:38:49,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3680120.0, ans=0.0 2023-11-28 21:38:53,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3680120.0, ans=0.02 2023-11-28 21:38:57,989 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10950, loss[loss=0.04999, simple_loss=0.07138, pruned_loss=0.004874, audio_tagging_loss=0.009432, over 15390.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08975, pruned_loss=0.0122, audio_tagging_loss=0.008623, over 3053368.25 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:38:59,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3680186.6666666665, ans=0.125 2023-11-28 21:39:09,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:11,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:16,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:19,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-28 21:39:20,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-28 21:39:23,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-28 21:39:28,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3680320.0, ans=0.0 2023-11-28 21:39:35,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3680386.6666666665, ans=0.125 2023-11-28 21:39:37,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680386.6666666665, ans=0.1 2023-11-28 21:39:38,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2023-11-28 21:39:42,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:46,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-28 21:39:47,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3680453.3333333335, ans=0.125 2023-11-28 21:39:48,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3680453.3333333335, ans=0.125 2023-11-28 21:39:52,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3680453.3333333335, ans=0.0 2023-11-28 21:39:54,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680453.3333333335, ans=0.1 2023-11-28 21:39:59,067 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11000, loss[loss=0.07982, simple_loss=0.1145, pruned_loss=0.01596, audio_tagging_loss=0.006616, over 15819.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08841, pruned_loss=0.01206, audio_tagging_loss=0.008686, over 3047223.14 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:40:09,708 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:40:13,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3680586.6666666665, ans=0.0 2023-11-28 21:40:16,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3680586.6666666665, ans=0.0 2023-11-28 21:40:23,612 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-28 21:40:24,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.92 vs. limit=10.0 2023-11-28 21:40:28,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3680653.3333333335, ans=0.2 2023-11-28 21:40:33,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.817e+01 9.387e+01 9.947e+01 1.401e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 21:40:33,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3680653.3333333335, ans=0.0 2023-11-28 21:41:01,322 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11050, loss[loss=0.07227, simple_loss=0.09968, pruned_loss=0.01512, audio_tagging_loss=0.007306, over 14520.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08863, pruned_loss=0.01216, audio_tagging_loss=0.008742, over 3047282.19 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:41:14,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3680920.0, ans=0.125 2023-11-28 21:41:25,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-28 21:41:32,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3680986.6666666665, ans=0.2 2023-11-28 21:41:36,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-28 21:41:39,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3681053.3333333335, ans=0.0 2023-11-28 21:41:39,569 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:42:00,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3681120.0, ans=0.5 2023-11-28 21:42:01,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3681186.6666666665, ans=0.125 2023-11-28 21:42:02,716 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11100, loss[loss=0.08321, simple_loss=0.122, pruned_loss=0.01385, audio_tagging_loss=0.008362, over 17063.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08909, pruned_loss=0.01206, audio_tagging_loss=0.008828, over 3046367.25 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:42:08,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3681186.6666666665, ans=0.125 2023-11-28 21:42:16,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-28 21:42:27,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-28 21:42:36,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3681320.0, ans=0.125 2023-11-28 21:42:37,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.965e+01 9.771e+01 1.048e+02 1.332e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 21:42:38,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3681320.0, ans=0.0 2023-11-28 21:42:39,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2023-11-28 21:43:04,748 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11150, loss[loss=0.07825, simple_loss=0.1167, pruned_loss=0.01342, audio_tagging_loss=0.006473, over 14567.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08984, pruned_loss=0.01204, audio_tagging_loss=0.008858, over 3050635.26 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:43:06,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681520.0, ans=0.125 2023-11-28 21:43:07,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3681520.0, ans=0.025 2023-11-28 21:43:21,311 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:43:28,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3681653.3333333335, ans=0.125 2023-11-28 21:43:29,335 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-28 21:43:36,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-11-28 21:43:38,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3681653.3333333335, ans=0.0 2023-11-28 21:43:45,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3681720.0, ans=0.0 2023-11-28 21:43:45,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3681720.0, ans=0.125 2023-11-28 21:43:46,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3681720.0, ans=0.04949747468305833 2023-11-28 21:43:54,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3681786.6666666665, ans=0.0 2023-11-28 21:44:00,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3681786.6666666665, ans=0.125 2023-11-28 21:44:06,243 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11200, loss[loss=0.06453, simple_loss=0.08589, pruned_loss=0.01258, audio_tagging_loss=0.00901, over 15584.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08872, pruned_loss=0.01198, audio_tagging_loss=0.00899, over 3052864.30 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:44:08,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3681853.3333333335, ans=0.125 2023-11-28 21:44:25,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3681920.0, ans=0.125 2023-11-28 21:44:30,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-28 21:44:33,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3681986.6666666665, ans=0.125 2023-11-28 21:44:41,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.920e+01 9.511e+01 1.032e+02 1.205e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:44:41,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3682053.3333333335, ans=0.125 2023-11-28 21:44:47,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3682053.3333333335, ans=0.1 2023-11-28 21:45:08,006 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11250, loss[loss=0.05027, simple_loss=0.06946, pruned_loss=0.00648, audio_tagging_loss=0.009057, over 14369.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08739, pruned_loss=0.01159, audio_tagging_loss=0.009067, over 3046931.71 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:45:21,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3682253.3333333335, ans=0.125 2023-11-28 21:45:32,179 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-28 21:46:02,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682453.3333333335, ans=0.1 2023-11-28 21:46:09,239 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11300, loss[loss=0.08353, simple_loss=0.1187, pruned_loss=0.01787, audio_tagging_loss=0.006331, over 14833.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08848, pruned_loss=0.01181, audio_tagging_loss=0.008843, over 3044469.55 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:46:31,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3682586.6666666665, ans=0.0 2023-11-28 21:46:34,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-28 21:46:46,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.990e+01 9.658e+01 1.057e+02 1.418e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 21:46:53,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3682720.0, ans=0.0 2023-11-28 21:47:02,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-28 21:47:12,710 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11350, loss[loss=0.06011, simple_loss=0.08102, pruned_loss=0.01025, audio_tagging_loss=0.009346, over 15438.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08886, pruned_loss=0.01192, audio_tagging_loss=0.008691, over 3046569.51 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:47:15,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3682853.3333333335, ans=0.125 2023-11-28 21:47:33,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3682920.0, ans=0.125 2023-11-28 21:47:37,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-28 21:47:48,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3683053.3333333335, ans=0.0 2023-11-28 21:48:12,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3683120.0, ans=0.125 2023-11-28 21:48:14,191 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11400, loss[loss=0.05715, simple_loss=0.07719, pruned_loss=0.0101, audio_tagging_loss=0.008455, over 14565.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08882, pruned_loss=0.01184, audio_tagging_loss=0.00868, over 3040877.24 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:48:15,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3683186.6666666665, ans=0.125 2023-11-28 21:48:38,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-28 21:48:49,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.845e+01 9.711e+01 1.056e+02 1.187e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 21:48:53,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-28 21:48:55,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3683386.6666666665, ans=0.125 2023-11-28 21:49:00,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3683386.6666666665, ans=0.2 2023-11-28 21:49:16,150 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11450, loss[loss=0.04614, simple_loss=0.05769, pruned_loss=0.007009, audio_tagging_loss=0.01029, over 15538.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.0884, pruned_loss=0.01187, audio_tagging_loss=0.008638, over 3040962.02 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:49:23,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3683520.0, ans=0.125 2023-11-28 21:49:23,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3683520.0, ans=0.125 2023-11-28 21:49:27,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-28 21:49:40,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-28 21:50:16,983 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11500, loss[loss=0.05931, simple_loss=0.08409, pruned_loss=0.0103, audio_tagging_loss=0.006964, over 15953.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.0882, pruned_loss=0.01179, audio_tagging_loss=0.008559, over 3045280.29 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:50:42,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-28 21:50:44,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3683986.6666666665, ans=0.125 2023-11-28 21:50:51,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3683986.6666666665, ans=0.125 2023-11-28 21:50:53,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 8.799e+01 9.432e+01 1.014e+02 1.264e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:50:54,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-28 21:50:56,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3684053.3333333335, ans=0.125 2023-11-28 21:51:12,943 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:51:15,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3684120.0, ans=0.125 2023-11-28 21:51:18,520 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11550, loss[loss=0.04804, simple_loss=0.0697, pruned_loss=0.006596, audio_tagging_loss=0.006592, over 14842.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0887, pruned_loss=0.01196, audio_tagging_loss=0.008539, over 3044708.96 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:51:43,983 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-28 21:51:44,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3684320.0, ans=0.125 2023-11-28 21:51:57,573 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:52:08,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3684453.3333333335, ans=0.0 2023-11-28 21:52:13,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-28 21:52:16,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-28 21:52:18,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3684453.3333333335, ans=0.04949747468305833 2023-11-28 21:52:20,639 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11600, loss[loss=0.05964, simple_loss=0.07902, pruned_loss=0.01079, audio_tagging_loss=0.009334, over 14515.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08937, pruned_loss=0.01204, audio_tagging_loss=0.008503, over 3043633.55 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:52:24,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.90 vs. limit=10.0 2023-11-28 21:52:45,063 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-28 21:52:55,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.024e+01 9.602e+01 1.030e+02 1.712e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 21:53:02,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3684720.0, ans=0.2 2023-11-28 21:53:04,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3684720.0, ans=0.2 2023-11-28 21:53:13,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3684786.6666666665, ans=0.125 2023-11-28 21:53:21,392 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11650, loss[loss=0.07068, simple_loss=0.104, pruned_loss=0.0119, audio_tagging_loss=0.006753, over 15006.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08917, pruned_loss=0.0121, audio_tagging_loss=0.008571, over 3042661.50 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:53:31,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3684853.3333333335, ans=0.125 2023-11-28 21:53:32,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3684920.0, ans=0.1 2023-11-28 21:53:46,735 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-28 21:53:50,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3684986.6666666665, ans=0.125 2023-11-28 21:54:22,814 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11700, loss[loss=0.04911, simple_loss=0.06583, pruned_loss=0.008064, audio_tagging_loss=0.008131, over 15254.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08852, pruned_loss=0.01206, audio_tagging_loss=0.008673, over 3044835.70 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:54:35,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685253.3333333335, ans=0.1 2023-11-28 21:54:35,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3685253.3333333335, ans=0.0 2023-11-28 21:54:40,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685253.3333333335, ans=0.125 2023-11-28 21:54:48,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-28 21:54:59,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.206e+01 9.735e+01 1.055e+02 1.331e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 21:55:07,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3685386.6666666665, ans=0.125 2023-11-28 21:55:18,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3685453.3333333335, ans=0.0 2023-11-28 21:55:24,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3685520.0, ans=0.125 2023-11-28 21:55:24,994 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11750, loss[loss=0.07517, simple_loss=0.09977, pruned_loss=0.01622, audio_tagging_loss=0.009068, over 15056.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08939, pruned_loss=0.01206, audio_tagging_loss=0.008584, over 3048867.72 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:55:25,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-28 21:55:27,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3685520.0, ans=0.125 2023-11-28 21:55:29,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-28 21:55:36,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-28 21:55:37,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3685586.6666666665, ans=0.0 2023-11-28 21:55:43,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-28 21:55:48,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:55:49,512 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-28 21:55:53,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:56:07,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3685720.0, ans=0.125 2023-11-28 21:56:09,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3685720.0, ans=0.125 2023-11-28 21:56:14,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3685786.6666666665, ans=0.0 2023-11-28 21:56:18,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3685786.6666666665, ans=0.125 2023-11-28 21:56:26,111 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11800, loss[loss=0.06345, simple_loss=0.08491, pruned_loss=0.009486, audio_tagging_loss=0.01151, over 15840.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08953, pruned_loss=0.01208, audio_tagging_loss=0.008547, over 3044559.87 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:56:36,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685853.3333333335, ans=0.0 2023-11-28 21:56:39,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3685920.0, ans=0.0 2023-11-28 21:56:48,242 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:56:50,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-28 21:56:52,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685986.6666666665, ans=0.1 2023-11-28 21:57:02,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.813e+01 9.510e+01 1.037e+02 1.447e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:57:08,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3686053.3333333335, ans=0.0 2023-11-28 21:57:11,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3686053.3333333335, ans=0.125 2023-11-28 21:57:18,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3686120.0, ans=0.0 2023-11-28 21:57:22,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3686120.0, ans=0.125 2023-11-28 21:57:28,209 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11850, loss[loss=0.06096, simple_loss=0.08377, pruned_loss=0.009649, audio_tagging_loss=0.009424, over 15851.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08939, pruned_loss=0.0119, audio_tagging_loss=0.008625, over 3043095.19 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:57:53,521 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-28 21:58:00,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3686320.0, ans=0.0 2023-11-28 21:58:12,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3686386.6666666665, ans=0.2 2023-11-28 21:58:19,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3686453.3333333335, ans=0.0 2023-11-28 21:58:23,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=22.5 2023-11-28 21:58:29,183 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11900, loss[loss=0.0537, simple_loss=0.07029, pruned_loss=0.009171, audio_tagging_loss=0.009388, over 16037.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08939, pruned_loss=0.01196, audio_tagging_loss=0.008774, over 3044518.85 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:58:54,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-28 21:59:05,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.697e+01 9.440e+01 1.029e+02 1.196e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 21:59:25,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3686786.6666666665, ans=0.125 2023-11-28 21:59:28,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3686786.6666666665, ans=0.0 2023-11-28 21:59:32,209 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11950, loss[loss=0.04646, simple_loss=0.06013, pruned_loss=0.004929, audio_tagging_loss=0.01147, over 14029.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08908, pruned_loss=0.01199, audio_tagging_loss=0.008796, over 3044363.91 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:59:41,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3686853.3333333335, ans=0.125 2023-11-28 21:59:45,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2023-11-28 21:59:55,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3686986.6666666665, ans=0.95 2023-11-28 21:59:56,176 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-28 21:59:56,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.52 vs. limit=22.5 2023-11-28 21:59:59,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3686986.6666666665, ans=0.125 2023-11-28 22:00:08,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=22.5 2023-11-28 22:00:23,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3687120.0, ans=0.0 2023-11-28 22:00:27,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3687120.0, ans=0.125 2023-11-28 22:00:31,715 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 12000, loss[loss=0.0547, simple_loss=0.07738, pruned_loss=0.007077, audio_tagging_loss=0.008931, over 14951.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08937, pruned_loss=0.01205, audio_tagging_loss=0.00889, over 3043895.49 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 22:00:31,716 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 22:01:11,991 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05835, simple_loss=0.05054, pruned_loss=0.005304, audio_tagging_loss=0.02778, over 4681554.00 frames. 2023-11-28 22:01:11,992 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 22:01:33,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3687320.0, ans=0.125 2023-11-28 22:01:34,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-28 22:01:56,253 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 0, loss[loss=0.06296, simple_loss=0.0782, pruned_loss=0.005664, audio_tagging_loss=0.0182, over 15074.00 frames. ], tot_loss[loss=0.06296, simple_loss=0.0782, pruned_loss=0.005664, audio_tagging_loss=0.0182, over 15074.00 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:01:56,254 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 22:02:32,347 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05784, simple_loss=0.05051, pruned_loss=0.005299, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-28 22:02:32,347 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 22:02:33,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3687340.0, ans=10.0 2023-11-28 22:02:39,328 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 9.135e+01 9.831e+01 1.074e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-28 22:02:40,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3687340.0, ans=0.2 2023-11-28 22:02:59,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3687473.3333333335, ans=0.0 2023-11-28 22:03:30,822 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-28 22:03:34,258 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 50, loss[loss=0.06549, simple_loss=0.07889, pruned_loss=0.009408, audio_tagging_loss=0.01664, over 13574.00 frames. ], tot_loss[loss=0.07319, simple_loss=0.08903, pruned_loss=0.01226, audio_tagging_loss=0.01641, over 690772.15 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:03:52,860 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:03:53,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3687740.0, ans=0.1 2023-11-28 22:03:54,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.82 vs. limit=15.0 2023-11-28 22:04:33,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-28 22:04:37,282 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 100, loss[loss=0.06462, simple_loss=0.07872, pruned_loss=0.00902, audio_tagging_loss=0.01624, over 14824.00 frames. ], tot_loss[loss=0.07284, simple_loss=0.09046, pruned_loss=0.01206, audio_tagging_loss=0.01555, over 1213157.70 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:04:43,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-11-28 22:04:44,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.649e+01 9.823e+01 1.051e+02 1.142e+02 1.295e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-28 22:04:55,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3688073.3333333335, ans=0.1 2023-11-28 22:04:55,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-28 22:05:01,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-28 22:05:08,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3688140.0, ans=0.0 2023-11-28 22:05:10,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3688140.0, ans=0.0 2023-11-28 22:05:20,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:23,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:23,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3688206.6666666665, ans=15.0 2023-11-28 22:05:36,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-28 22:05:40,249 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 150, loss[loss=0.06231, simple_loss=0.08524, pruned_loss=0.009139, audio_tagging_loss=0.01055, over 15284.00 frames. ], tot_loss[loss=0.07092, simple_loss=0.08958, pruned_loss=0.01197, audio_tagging_loss=0.01416, over 1619221.47 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:05:52,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-28 22:06:00,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3688406.6666666665, ans=0.2 2023-11-28 22:06:39,485 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-28 22:06:42,858 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 200, loss[loss=0.07934, simple_loss=0.1076, pruned_loss=0.01694, audio_tagging_loss=0.008605, over 14990.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.08946, pruned_loss=0.01197, audio_tagging_loss=0.01259, over 1933397.23 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:06:46,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688673.3333333335, ans=0.1 2023-11-28 22:06:51,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 9.056e+01 9.738e+01 1.064e+02 1.248e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 22:07:22,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-28 22:07:29,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3688873.3333333335, ans=0.0 2023-11-28 22:07:31,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-28 22:07:33,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688940.0, ans=0.1 2023-11-28 22:07:41,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-28 22:07:44,523 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 250, loss[loss=0.09321, simple_loss=0.1296, pruned_loss=0.02264, audio_tagging_loss=0.005757, over 15731.00 frames. ], tot_loss[loss=0.06879, simple_loss=0.09075, pruned_loss=0.0122, audio_tagging_loss=0.01122, over 2187638.70 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:07:49,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3689006.6666666665, ans=0.125 2023-11-28 22:08:01,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3689073.3333333335, ans=0.125 2023-11-28 22:08:06,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-28 22:08:10,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-28 22:08:11,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3689140.0, ans=0.025 2023-11-28 22:08:19,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3689140.0, ans=0.125 2023-11-28 22:08:29,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3689206.6666666665, ans=0.125 2023-11-28 22:08:29,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3689206.6666666665, ans=0.1 2023-11-28 22:08:42,571 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-28 22:08:46,414 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 300, loss[loss=0.06873, simple_loss=0.09809, pruned_loss=0.008004, audio_tagging_loss=0.01168, over 17096.00 frames. ], tot_loss[loss=0.0688, simple_loss=0.09155, pruned_loss=0.0125, audio_tagging_loss=0.01052, over 2384640.64 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:08:48,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689340.0, ans=0.1 2023-11-28 22:08:55,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 9.275e+01 9.937e+01 1.062e+02 1.967e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-28 22:09:00,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3689406.6666666665, ans=0.125 2023-11-28 22:09:11,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3689473.3333333335, ans=0.0 2023-11-28 22:09:24,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3689540.0, ans=0.5 2023-11-28 22:09:44,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-28 22:09:48,035 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 350, loss[loss=0.08298, simple_loss=0.1176, pruned_loss=0.01783, audio_tagging_loss=0.00634, over 15524.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09095, pruned_loss=0.01229, audio_tagging_loss=0.01003, over 2545038.90 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:10:01,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3689740.0, ans=0.125 2023-11-28 22:10:21,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-28 22:10:22,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3689806.6666666665, ans=0.125 2023-11-28 22:10:27,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-28 22:10:30,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3689873.3333333335, ans=0.0 2023-11-28 22:10:44,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-11-28 22:10:44,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-28 22:10:45,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-28 22:10:48,610 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 400, loss[loss=0.07743, simple_loss=0.1062, pruned_loss=0.01703, audio_tagging_loss=0.007314, over 15323.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09061, pruned_loss=0.01227, audio_tagging_loss=0.009678, over 2658686.53 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:10:54,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3690006.6666666665, ans=0.125 2023-11-28 22:10:56,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.027e+01 9.535e+01 1.022e+02 1.341e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 22:11:10,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3690073.3333333335, ans=0.0 2023-11-28 22:11:11,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-28 22:11:43,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3690273.3333333335, ans=0.0 2023-11-28 22:11:47,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-28 22:11:51,243 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 450, loss[loss=0.05884, simple_loss=0.0714, pruned_loss=0.01162, audio_tagging_loss=0.01152, over 15763.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0897, pruned_loss=0.01224, audio_tagging_loss=0.009463, over 2744247.64 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:12:00,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3690340.0, ans=0.1 2023-11-28 22:12:09,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3690406.6666666665, ans=0.125 2023-11-28 22:12:16,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3690473.3333333335, ans=0.125 2023-11-28 22:12:20,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3690473.3333333335, ans=0.125 2023-11-28 22:12:22,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3690473.3333333335, ans=0.0 2023-11-28 22:12:48,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.07 vs. limit=10.0 2023-11-28 22:12:48,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-28 22:12:52,912 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 500, loss[loss=0.04709, simple_loss=0.0582, pruned_loss=0.006045, audio_tagging_loss=0.01194, over 14990.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08921, pruned_loss=0.01236, audio_tagging_loss=0.009263, over 2805264.50 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:12:59,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3690673.3333333335, ans=0.1 2023-11-28 22:13:01,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.926e+01 9.624e+01 1.054e+02 1.218e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:13:04,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3690740.0, ans=0.1 2023-11-28 22:13:06,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3690740.0, ans=0.0 2023-11-28 22:13:18,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690806.6666666665, ans=0.1 2023-11-28 22:13:47,647 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:13:51,644 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-28 22:13:55,647 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 550, loss[loss=0.06045, simple_loss=0.07987, pruned_loss=0.01211, audio_tagging_loss=0.008401, over 15410.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08915, pruned_loss=0.01224, audio_tagging_loss=0.009091, over 2852311.50 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:14:05,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-28 22:14:40,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-28 22:14:53,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-28 22:14:57,457 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 600, loss[loss=0.06542, simple_loss=0.09739, pruned_loss=0.009397, audio_tagging_loss=0.007323, over 15290.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08863, pruned_loss=0.01217, audio_tagging_loss=0.009144, over 2895062.98 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:15:06,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.960e+01 9.634e+01 1.013e+02 1.210e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:15:17,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3691406.6666666665, ans=0.1 2023-11-28 22:15:27,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3691473.3333333335, ans=0.125 2023-11-28 22:15:40,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3691540.0, ans=0.0 2023-11-28 22:15:45,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=12.0 2023-11-28 22:15:55,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-28 22:15:58,957 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 650, loss[loss=0.08625, simple_loss=0.1175, pruned_loss=0.021, audio_tagging_loss=0.006495, over 16646.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08981, pruned_loss=0.01229, audio_tagging_loss=0.009059, over 2938648.29 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:16:13,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3691740.0, ans=0.125 2023-11-28 22:16:20,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3691740.0, ans=0.125 2023-11-28 22:16:27,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-28 22:16:50,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3691940.0, ans=0.125 2023-11-28 22:16:56,067 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-28 22:16:56,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3691940.0, ans=0.0 2023-11-28 22:17:00,542 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 700, loss[loss=0.07403, simple_loss=0.09508, pruned_loss=0.01413, audio_tagging_loss=0.01235, over 15059.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09053, pruned_loss=0.01247, audio_tagging_loss=0.008975, over 2963138.52 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:17:09,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.938e+01 9.507e+01 1.029e+02 1.273e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 22:17:34,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-28 22:17:58,280 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-28 22:17:58,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3692273.3333333335, ans=0.07 2023-11-28 22:18:02,260 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 750, loss[loss=0.06096, simple_loss=0.0809, pruned_loss=0.01012, audio_tagging_loss=0.01038, over 15529.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09115, pruned_loss=0.01253, audio_tagging_loss=0.008893, over 2978104.70 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:18:03,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3692340.0, ans=0.0 2023-11-28 22:18:05,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3692340.0, ans=0.125 2023-11-28 22:18:30,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3692473.3333333335, ans=0.125 2023-11-28 22:19:00,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-28 22:19:04,213 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 800, loss[loss=0.06618, simple_loss=0.09577, pruned_loss=0.01031, audio_tagging_loss=0.007976, over 15508.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0905, pruned_loss=0.0122, audio_tagging_loss=0.008924, over 2995226.95 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:19:12,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.995e+01 9.559e+01 1.026e+02 1.353e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:19:26,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3692740.0, ans=0.1 2023-11-28 22:19:40,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3692873.3333333335, ans=0.125 2023-11-28 22:19:51,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3692873.3333333335, ans=0.0 2023-11-28 22:20:02,045 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-28 22:20:02,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-28 22:20:05,583 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 850, loss[loss=0.06976, simple_loss=0.08956, pruned_loss=0.01457, audio_tagging_loss=0.01041, over 15999.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.0904, pruned_loss=0.01214, audio_tagging_loss=0.009056, over 3014398.88 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:20:07,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3693006.6666666665, ans=0.125 2023-11-28 22:20:32,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3693140.0, ans=0.125 2023-11-28 22:20:42,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693206.6666666665, ans=0.1 2023-11-28 22:20:57,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2023-11-28 22:20:59,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3693273.3333333335, ans=0.125 2023-11-28 22:21:01,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3693273.3333333335, ans=0.0 2023-11-28 22:21:03,735 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-28 22:21:07,967 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 900, loss[loss=0.06656, simple_loss=0.09476, pruned_loss=0.01183, audio_tagging_loss=0.007347, over 15944.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08986, pruned_loss=0.01204, audio_tagging_loss=0.00912, over 3020887.97 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:21:14,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3693340.0, ans=0.0 2023-11-28 22:21:16,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.970e+01 9.672e+01 1.016e+02 1.262e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 22:21:16,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3693340.0, ans=0.2 2023-11-28 22:21:17,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3693340.0, ans=0.1 2023-11-28 22:21:48,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3693540.0, ans=0.125 2023-11-28 22:21:49,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3693540.0, ans=0.0 2023-11-28 22:22:06,217 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-28 22:22:10,228 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 950, loss[loss=0.04567, simple_loss=0.04782, pruned_loss=0.007253, audio_tagging_loss=0.01451, over 13884.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09054, pruned_loss=0.01213, audio_tagging_loss=0.008962, over 3033455.62 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:22:40,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3693806.6666666665, ans=0.2 2023-11-28 22:23:07,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-28 22:23:10,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-28 22:23:11,513 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1000, loss[loss=0.05268, simple_loss=0.07003, pruned_loss=0.00764, audio_tagging_loss=0.01003, over 14932.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08939, pruned_loss=0.01193, audio_tagging_loss=0.008805, over 3036194.87 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:23:20,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.063e+01 9.775e+01 1.049e+02 2.458e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-28 22:23:20,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-28 22:23:20,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-28 22:23:20,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3694006.6666666665, ans=0.125 2023-11-28 22:23:22,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3694073.3333333335, ans=0.125 2023-11-28 22:23:32,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3694073.3333333335, ans=0.125 2023-11-28 22:23:35,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3694140.0, ans=0.125 2023-11-28 22:23:39,313 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:23:50,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3694206.6666666665, ans=0.2 2023-11-28 22:24:00,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3694273.3333333335, ans=0.0 2023-11-28 22:24:09,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-28 22:24:13,277 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1050, loss[loss=0.0867, simple_loss=0.1254, pruned_loss=0.0145, audio_tagging_loss=0.00949, over 14420.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08865, pruned_loss=0.01181, audio_tagging_loss=0.008779, over 3035252.74 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:24:19,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-28 22:24:43,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3694473.3333333335, ans=0.0 2023-11-28 22:24:58,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3694540.0, ans=0.0 2023-11-28 22:25:03,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3694606.6666666665, ans=0.2 2023-11-28 22:25:04,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-28 22:25:11,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-28 22:25:15,610 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1100, loss[loss=0.07927, simple_loss=0.1135, pruned_loss=0.01467, audio_tagging_loss=0.007874, over 14884.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08825, pruned_loss=0.01183, audio_tagging_loss=0.008755, over 3035363.96 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:25:19,619 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:25:24,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.004e+01 9.578e+01 1.033e+02 1.285e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:25:32,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3694740.0, ans=0.125 2023-11-28 22:25:33,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3694740.0, ans=0.125 2023-11-28 22:25:37,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3694740.0, ans=0.125 2023-11-28 22:25:41,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3694806.6666666665, ans=0.0 2023-11-28 22:25:42,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3694806.6666666665, ans=0.125 2023-11-28 22:25:57,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-28 22:26:13,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-28 22:26:17,340 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1150, loss[loss=0.06279, simple_loss=0.08739, pruned_loss=0.01119, audio_tagging_loss=0.00791, over 15274.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08868, pruned_loss=0.01196, audio_tagging_loss=0.008586, over 3034322.42 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:26:36,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3695073.3333333335, ans=0.1 2023-11-28 22:26:44,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3695140.0, ans=0.125 2023-11-28 22:26:53,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3695140.0, ans=0.09899494936611666 2023-11-28 22:27:11,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3695273.3333333335, ans=0.0 2023-11-28 22:27:11,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3695273.3333333335, ans=0.0 2023-11-28 22:27:15,933 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-28 22:27:19,255 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1200, loss[loss=0.05584, simple_loss=0.07316, pruned_loss=0.009935, audio_tagging_loss=0.00933, over 16199.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08855, pruned_loss=0.01188, audio_tagging_loss=0.008588, over 3042663.11 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:27:27,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.745e+01 9.451e+01 1.036e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 22:27:44,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3695473.3333333335, ans=0.0 2023-11-28 22:27:55,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3695540.0, ans=0.0 2023-11-28 22:27:56,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2023-11-28 22:28:00,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2023-11-28 22:28:16,955 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-28 22:28:20,962 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1250, loss[loss=0.05569, simple_loss=0.07372, pruned_loss=0.01013, audio_tagging_loss=0.008703, over 14775.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08846, pruned_loss=0.01185, audio_tagging_loss=0.008531, over 3041974.00 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:28:23,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3695673.3333333335, ans=0.5 2023-11-28 22:28:32,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3695740.0, ans=0.0 2023-11-28 22:28:33,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3695740.0, ans=0.125 2023-11-28 22:28:49,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3695806.6666666665, ans=0.0 2023-11-28 22:29:11,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3695940.0, ans=0.0 2023-11-28 22:29:11,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3695940.0, ans=0.0 2023-11-28 22:29:14,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3695940.0, ans=0.1 2023-11-28 22:29:18,980 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-28 22:29:22,770 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1300, loss[loss=0.07273, simple_loss=0.09697, pruned_loss=0.0133, audio_tagging_loss=0.01094, over 15192.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08899, pruned_loss=0.01181, audio_tagging_loss=0.008506, over 3044886.26 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:29:26,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3696006.6666666665, ans=0.125 2023-11-28 22:29:30,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 9.038e+01 9.627e+01 1.019e+02 1.676e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:29:46,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3696140.0, ans=0.125 2023-11-28 22:30:07,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3696206.6666666665, ans=0.0 2023-11-28 22:30:21,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-28 22:30:24,558 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1350, loss[loss=0.06939, simple_loss=0.1028, pruned_loss=0.01247, audio_tagging_loss=0.005517, over 14992.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08833, pruned_loss=0.01168, audio_tagging_loss=0.008549, over 3038120.54 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:30:34,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3696340.0, ans=0.125 2023-11-28 22:31:09,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3696540.0, ans=0.125 2023-11-28 22:31:10,319 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:31:14,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-11-28 22:31:18,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3696606.6666666665, ans=0.1 2023-11-28 22:31:22,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-28 22:31:25,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-28 22:31:26,133 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1400, loss[loss=0.06394, simple_loss=0.08916, pruned_loss=0.009186, audio_tagging_loss=0.01017, over 16045.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08856, pruned_loss=0.01172, audio_tagging_loss=0.008567, over 3050290.78 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:31:27,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3696673.3333333335, ans=0.025 2023-11-28 22:31:32,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-28 22:31:35,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.987e+01 9.786e+01 1.046e+02 1.300e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 22:31:36,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3696673.3333333335, ans=0.5 2023-11-28 22:31:36,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-28 22:31:36,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-28 22:31:57,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3696806.6666666665, ans=0.125 2023-11-28 22:32:24,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-28 22:32:28,202 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1450, loss[loss=0.06119, simple_loss=0.08321, pruned_loss=0.01146, audio_tagging_loss=0.008122, over 15932.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08782, pruned_loss=0.01163, audio_tagging_loss=0.008656, over 3047646.82 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:32:28,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3697006.6666666665, ans=0.5 2023-11-28 22:32:49,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-28 22:32:57,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3697140.0, ans=0.0 2023-11-28 22:32:59,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2023-11-28 22:33:00,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3697140.0, ans=0.0 2023-11-28 22:33:25,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-28 22:33:29,729 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1500, loss[loss=0.06126, simple_loss=0.0821, pruned_loss=0.009424, audio_tagging_loss=0.01079, over 15167.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08915, pruned_loss=0.01191, audio_tagging_loss=0.008534, over 3048552.30 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:33:35,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3697340.0, ans=0.125 2023-11-28 22:33:39,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3697340.0, ans=0.125 2023-11-28 22:33:40,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.916e+01 9.599e+01 1.025e+02 1.569e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:33:54,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-28 22:33:58,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2023-11-28 22:34:18,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3697606.6666666665, ans=0.05 2023-11-28 22:34:26,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3697606.6666666665, ans=0.0 2023-11-28 22:34:27,829 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-28 22:34:28,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3697606.6666666665, ans=0.07 2023-11-28 22:34:31,282 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1550, loss[loss=0.05477, simple_loss=0.06859, pruned_loss=0.009562, audio_tagging_loss=0.01092, over 14677.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08875, pruned_loss=0.01188, audio_tagging_loss=0.00861, over 3047061.35 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:34:38,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3697673.3333333335, ans=0.0 2023-11-28 22:35:09,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3697873.3333333335, ans=0.2 2023-11-28 22:35:29,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-28 22:35:32,750 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1600, loss[loss=0.07489, simple_loss=0.1014, pruned_loss=0.01565, audio_tagging_loss=0.008527, over 14396.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08852, pruned_loss=0.01174, audio_tagging_loss=0.00867, over 3045311.32 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:35:35,366 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:35:41,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3698006.6666666665, ans=6.0 2023-11-28 22:35:44,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.984e+01 9.580e+01 1.035e+02 1.494e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:35:50,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3698073.3333333335, ans=0.04949747468305833 2023-11-28 22:35:51,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698073.3333333335, ans=0.1 2023-11-28 22:35:51,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-28 22:35:55,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3698073.3333333335, ans=0.125 2023-11-28 22:36:24,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698273.3333333335, ans=0.1 2023-11-28 22:36:31,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-28 22:36:31,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3698273.3333333335, ans=0.125 2023-11-28 22:36:34,186 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:36:35,090 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1650, loss[loss=0.06668, simple_loss=0.09065, pruned_loss=0.01085, audio_tagging_loss=0.01051, over 15308.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08916, pruned_loss=0.0119, audio_tagging_loss=0.008795, over 3044438.69 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:36:42,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3698340.0, ans=0.0 2023-11-28 22:36:52,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-28 22:36:58,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3698473.3333333335, ans=0.125 2023-11-28 22:37:32,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698606.6666666665, ans=0.1 2023-11-28 22:37:33,137 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-28 22:37:36,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3698673.3333333335, ans=0.125 2023-11-28 22:37:37,053 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1700, loss[loss=0.04604, simple_loss=0.05611, pruned_loss=0.008669, audio_tagging_loss=0.009316, over 15407.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08901, pruned_loss=0.01186, audio_tagging_loss=0.008817, over 3046291.10 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:37:39,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2023-11-28 22:37:47,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.852e+01 9.479e+01 1.004e+02 1.252e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 22:37:51,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3698740.0, ans=0.2 2023-11-28 22:38:04,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2023-11-28 22:38:33,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3698940.0, ans=0.125 2023-11-28 22:38:34,907 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-28 22:38:37,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-28 22:38:38,833 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1750, loss[loss=0.07125, simple_loss=0.09742, pruned_loss=0.014, audio_tagging_loss=0.008538, over 15469.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08973, pruned_loss=0.0121, audio_tagging_loss=0.008721, over 3049342.12 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:38:48,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699006.6666666665, ans=0.1 2023-11-28 22:38:50,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-28 22:39:05,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3699140.0, ans=0.125 2023-11-28 22:39:19,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3699206.6666666665, ans=0.0 2023-11-28 22:39:35,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699273.3333333335, ans=0.1 2023-11-28 22:39:36,318 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-28 22:39:40,434 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1800, loss[loss=0.08972, simple_loss=0.1329, pruned_loss=0.01791, audio_tagging_loss=0.005361, over 16320.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09026, pruned_loss=0.01216, audio_tagging_loss=0.008565, over 3045086.86 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:39:43,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3699340.0, ans=0.125 2023-11-28 22:39:44,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3699340.0, ans=0.125 2023-11-28 22:39:49,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3699340.0, ans=0.125 2023-11-28 22:39:51,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.095e+01 9.843e+01 1.068e+02 2.957e+02, threshold=1.969e+02, percent-clipped=2.0 2023-11-28 22:39:57,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3699406.6666666665, ans=0.125 2023-11-28 22:39:59,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2023-11-28 22:40:06,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3699473.3333333335, ans=0.125 2023-11-28 22:40:27,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-28 22:40:28,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3699606.6666666665, ans=0.125 2023-11-28 22:40:38,711 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-28 22:40:42,255 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1850, loss[loss=0.07866, simple_loss=0.1142, pruned_loss=0.01414, audio_tagging_loss=0.007407, over 15060.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09061, pruned_loss=0.01224, audio_tagging_loss=0.008549, over 3044407.11 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:40:45,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2023-11-28 22:40:54,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-28 22:41:16,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3699806.6666666665, ans=0.125 2023-11-28 22:41:21,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3699873.3333333335, ans=0.0 2023-11-28 22:41:38,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3699940.0, ans=0.125 2023-11-28 22:41:40,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-28 22:41:41,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3699940.0, ans=0.125 2023-11-28 22:41:44,222 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1900, loss[loss=0.07663, simple_loss=0.1044, pruned_loss=0.01836, audio_tagging_loss=0.006053, over 16115.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0899, pruned_loss=0.01227, audio_tagging_loss=0.00844, over 3041367.65 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:41:49,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-28 22:41:55,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.917e+01 9.676e+01 1.038e+02 1.630e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 22:42:42,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-28 22:42:45,384 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1950, loss[loss=0.0465, simple_loss=0.05802, pruned_loss=0.005557, audio_tagging_loss=0.01193, over 15477.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09041, pruned_loss=0.01222, audio_tagging_loss=0.008411, over 3039270.10 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:42:56,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-28 22:42:57,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700406.6666666665, ans=0.1 2023-11-28 22:42:58,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3700406.6666666665, ans=0.2 2023-11-28 22:43:09,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-28 22:43:20,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-28 22:43:40,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3700606.6666666665, ans=0.0 2023-11-28 22:43:43,575 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-28 22:43:44,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3700606.6666666665, ans=0.125 2023-11-28 22:43:46,959 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2000, loss[loss=0.06954, simple_loss=0.09478, pruned_loss=0.009511, audio_tagging_loss=0.01264, over 15178.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08899, pruned_loss=0.01194, audio_tagging_loss=0.008518, over 3034920.15 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:43:58,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.916e+01 9.601e+01 1.024e+02 1.438e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:44:21,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2023-11-28 22:44:24,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3700873.3333333335, ans=0.125 2023-11-28 22:44:34,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-28 22:44:45,214 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-28 22:44:48,545 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2050, loss[loss=0.07616, simple_loss=0.1051, pruned_loss=0.01718, audio_tagging_loss=0.006454, over 16366.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08781, pruned_loss=0.01171, audio_tagging_loss=0.008575, over 3039246.13 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:44:50,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3701006.6666666665, ans=0.1 2023-11-28 22:44:51,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3701006.6666666665, ans=0.125 2023-11-28 22:44:55,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-28 22:45:06,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3701073.3333333335, ans=0.1 2023-11-28 22:45:30,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3701206.6666666665, ans=0.125 2023-11-28 22:45:46,320 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-28 22:45:50,150 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2100, loss[loss=0.05502, simple_loss=0.07205, pruned_loss=0.01026, audio_tagging_loss=0.008732, over 14471.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08846, pruned_loss=0.01171, audio_tagging_loss=0.008496, over 3042661.04 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:45:52,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3701340.0, ans=0.1 2023-11-28 22:46:02,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.958e+01 9.568e+01 1.025e+02 1.229e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 22:46:14,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3701473.3333333335, ans=0.0 2023-11-28 22:46:22,668 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:46:47,996 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-28 22:46:49,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3701606.6666666665, ans=0.0 2023-11-28 22:46:52,234 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2150, loss[loss=0.07137, simple_loss=0.08978, pruned_loss=0.01772, audio_tagging_loss=0.008764, over 14840.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08882, pruned_loss=0.01181, audio_tagging_loss=0.00856, over 3036473.73 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:46:59,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2023-11-28 22:47:14,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-28 22:47:30,079 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:47:50,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-28 22:47:53,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-28 22:47:54,167 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2200, loss[loss=0.06695, simple_loss=0.09591, pruned_loss=0.01, audio_tagging_loss=0.008988, over 15496.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08902, pruned_loss=0.01175, audio_tagging_loss=0.008576, over 3040032.13 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:48:06,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.046e+01 9.585e+01 1.059e+02 1.446e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 22:48:07,850 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:48:08,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702073.3333333335, ans=0.1 2023-11-28 22:48:13,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3702073.3333333335, ans=0.125 2023-11-28 22:48:14,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3702073.3333333335, ans=0.125 2023-11-28 22:48:18,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3702140.0, ans=0.0 2023-11-28 22:48:21,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:27,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:35,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3702206.6666666665, ans=0.125 2023-11-28 22:48:35,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702206.6666666665, ans=0.1 2023-11-28 22:48:38,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3702206.6666666665, ans=0.125 2023-11-28 22:48:42,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-28 22:48:52,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-28 22:48:53,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-28 22:48:53,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-28 22:48:55,614 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2250, loss[loss=0.05731, simple_loss=0.07073, pruned_loss=0.009728, audio_tagging_loss=0.01221, over 15668.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08838, pruned_loss=0.01161, audio_tagging_loss=0.008666, over 3041801.59 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:49:08,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3702406.6666666665, ans=0.05 2023-11-28 22:49:37,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3702540.0, ans=0.0 2023-11-28 22:49:44,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3702606.6666666665, ans=0.125 2023-11-28 22:49:44,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3702606.6666666665, ans=0.1 2023-11-28 22:49:52,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-28 22:49:52,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3702606.6666666665, ans=0.0 2023-11-28 22:49:56,834 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2300, loss[loss=0.07413, simple_loss=0.1077, pruned_loss=0.01226, audio_tagging_loss=0.008009, over 15185.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08909, pruned_loss=0.01159, audio_tagging_loss=0.008657, over 3042831.14 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:50:09,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.895e+01 9.474e+01 1.045e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 22:50:11,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3702740.0, ans=0.125 2023-11-28 22:50:19,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3702740.0, ans=0.125 2023-11-28 22:50:25,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702806.6666666665, ans=0.1 2023-11-28 22:50:27,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702806.6666666665, ans=0.125 2023-11-28 22:50:28,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702806.6666666665, ans=0.1 2023-11-28 22:50:31,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3702806.6666666665, ans=0.0 2023-11-28 22:50:41,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702873.3333333335, ans=0.1 2023-11-28 22:50:51,522 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:50:53,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3702940.0, ans=0.0 2023-11-28 22:50:55,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-28 22:50:55,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3702940.0, ans=0.2 2023-11-28 22:50:56,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702940.0, ans=0.1 2023-11-28 22:50:58,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=22.5 2023-11-28 22:50:58,627 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2350, loss[loss=0.07036, simple_loss=0.09932, pruned_loss=0.01148, audio_tagging_loss=0.00922, over 15394.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08946, pruned_loss=0.01167, audio_tagging_loss=0.008735, over 3042655.35 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:51:05,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3703006.6666666665, ans=0.125 2023-11-28 22:51:14,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-28 22:51:16,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3703073.3333333335, ans=0.0 2023-11-28 22:51:42,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3703206.6666666665, ans=0.1 2023-11-28 22:51:56,445 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-28 22:51:57,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3703273.3333333335, ans=0.0 2023-11-28 22:51:59,834 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2400, loss[loss=0.05646, simple_loss=0.07103, pruned_loss=0.008794, audio_tagging_loss=0.01215, over 16656.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.0897, pruned_loss=0.01181, audio_tagging_loss=0.008866, over 3040098.89 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:52:11,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.887e+01 9.633e+01 1.018e+02 1.587e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:52:51,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3703606.6666666665, ans=0.0 2023-11-28 22:52:55,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3703606.6666666665, ans=0.1 2023-11-28 22:52:57,411 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-28 22:53:01,554 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2450, loss[loss=0.07717, simple_loss=0.119, pruned_loss=0.01254, audio_tagging_loss=0.005111, over 16099.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08992, pruned_loss=0.01188, audio_tagging_loss=0.008867, over 3038016.13 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:53:14,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3703740.0, ans=0.0 2023-11-28 22:53:15,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3703740.0, ans=0.1 2023-11-28 22:53:15,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3703740.0, ans=0.2 2023-11-28 22:53:28,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-28 22:53:34,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3703806.6666666665, ans=0.035 2023-11-28 22:53:46,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3703873.3333333335, ans=0.125 2023-11-28 22:53:55,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3703940.0, ans=0.0 2023-11-28 22:53:59,815 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-28 22:54:04,204 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2500, loss[loss=0.06439, simple_loss=0.09559, pruned_loss=0.007728, audio_tagging_loss=0.008865, over 15594.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09003, pruned_loss=0.01197, audio_tagging_loss=0.008848, over 3034013.64 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:54:10,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-28 22:54:17,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.036e+01 9.436e+01 1.021e+02 1.491e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 22:54:30,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3704140.0, ans=0.2 2023-11-28 22:54:37,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3704140.0, ans=0.0 2023-11-28 22:54:51,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3704206.6666666665, ans=0.0 2023-11-28 22:54:56,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3704273.3333333335, ans=0.2 2023-11-28 22:54:58,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-28 22:54:59,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-28 22:55:00,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3704273.3333333335, ans=0.125 2023-11-28 22:55:02,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-28 22:55:06,518 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2550, loss[loss=0.06395, simple_loss=0.09143, pruned_loss=0.01062, audio_tagging_loss=0.007616, over 15484.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08933, pruned_loss=0.01191, audio_tagging_loss=0.008793, over 3030592.04 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:55:29,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704473.3333333335, ans=0.1 2023-11-28 22:55:32,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3704473.3333333335, ans=0.2 2023-11-28 22:55:40,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3704473.3333333335, ans=0.1 2023-11-28 22:56:02,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2023-11-28 22:56:02,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2023-11-28 22:56:04,083 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-28 22:56:07,444 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2600, loss[loss=0.05884, simple_loss=0.08181, pruned_loss=0.01081, audio_tagging_loss=0.007119, over 15758.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.0892, pruned_loss=0.01194, audio_tagging_loss=0.008643, over 3031757.59 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:56:20,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.832e+01 9.497e+01 1.024e+02 1.176e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 22:56:42,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3704806.6666666665, ans=0.125 2023-11-28 22:56:48,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3704873.3333333335, ans=0.1 2023-11-28 22:57:03,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3704940.0, ans=0.125 2023-11-28 22:57:05,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-28 22:57:08,999 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2650, loss[loss=0.06792, simple_loss=0.09029, pruned_loss=0.01482, audio_tagging_loss=0.007959, over 15401.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08959, pruned_loss=0.01196, audio_tagging_loss=0.008587, over 3035398.23 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:57:28,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3705073.3333333335, ans=0.1 2023-11-28 22:57:36,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3705140.0, ans=0.025 2023-11-28 22:57:40,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3705140.0, ans=0.0 2023-11-28 22:57:51,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2023-11-28 22:58:07,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-28 22:58:09,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2023-11-28 22:58:11,031 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2700, loss[loss=0.05669, simple_loss=0.07904, pruned_loss=0.007616, audio_tagging_loss=0.009556, over 14957.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08908, pruned_loss=0.0118, audio_tagging_loss=0.008604, over 3038039.99 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:58:16,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3705340.0, ans=0.0 2023-11-28 22:58:19,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=12.0 2023-11-28 22:58:24,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.013e+01 9.562e+01 1.012e+02 1.188e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:58:41,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3705473.3333333335, ans=0.125 2023-11-28 22:59:02,337 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:59:09,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-28 22:59:12,647 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2750, loss[loss=0.06335, simple_loss=0.08965, pruned_loss=0.01022, audio_tagging_loss=0.008304, over 15079.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08891, pruned_loss=0.01167, audio_tagging_loss=0.008455, over 3039324.35 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:00:07,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-28 23:00:07,602 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:00:10,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-28 23:00:12,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2023-11-28 23:00:13,451 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2800, loss[loss=0.07609, simple_loss=0.1162, pruned_loss=0.01164, audio_tagging_loss=0.006341, over 14485.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08872, pruned_loss=0.01161, audio_tagging_loss=0.008517, over 3039720.05 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:00:21,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-28 23:00:27,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.973e+01 9.470e+01 1.013e+02 1.282e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 23:00:30,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3706073.3333333335, ans=0.1 2023-11-28 23:00:42,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3706140.0, ans=0.125 2023-11-28 23:01:02,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3706273.3333333335, ans=0.125 2023-11-28 23:01:12,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-28 23:01:16,209 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2850, loss[loss=0.06522, simple_loss=0.09394, pruned_loss=0.01363, audio_tagging_loss=0.004619, over 15558.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08861, pruned_loss=0.01168, audio_tagging_loss=0.008441, over 3051211.90 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:01:27,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3706406.6666666665, ans=0.125 2023-11-28 23:01:29,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3706406.6666666665, ans=0.125 2023-11-28 23:01:50,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-28 23:02:03,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3706540.0, ans=0.0 2023-11-28 23:02:07,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-28 23:02:14,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-28 23:02:16,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3706606.6666666665, ans=0.0 2023-11-28 23:02:21,338 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2900, loss[loss=0.07652, simple_loss=0.1022, pruned_loss=0.01472, audio_tagging_loss=0.01071, over 16589.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.0892, pruned_loss=0.01172, audio_tagging_loss=0.008469, over 3051570.66 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:02:31,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3706673.3333333335, ans=0.1 2023-11-28 23:02:34,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.955e+01 9.573e+01 1.059e+02 1.416e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 23:02:40,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:03:00,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3706873.3333333335, ans=0.0 2023-11-28 23:03:07,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3706873.3333333335, ans=0.125 2023-11-28 23:03:08,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3706873.3333333335, ans=0.2 2023-11-28 23:03:15,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3706940.0, ans=0.1 2023-11-28 23:03:19,315 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-28 23:03:22,903 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2950, loss[loss=0.07353, simple_loss=0.1005, pruned_loss=0.01578, audio_tagging_loss=0.007488, over 15230.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.09012, pruned_loss=0.01191, audio_tagging_loss=0.008404, over 3053950.18 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:03:50,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3707140.0, ans=0.0 2023-11-28 23:03:53,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3707140.0, ans=0.2 2023-11-28 23:04:04,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-28 23:04:09,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3707206.6666666665, ans=0.125 2023-11-28 23:04:14,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-11-28 23:04:16,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3707273.3333333335, ans=0.125 2023-11-28 23:04:21,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-28 23:04:21,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3707273.3333333335, ans=0.125 2023-11-28 23:04:22,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-28 23:04:24,933 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3000, loss[loss=0.05898, simple_loss=0.07728, pruned_loss=0.01179, audio_tagging_loss=0.008549, over 15596.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08959, pruned_loss=0.01188, audio_tagging_loss=0.008543, over 3060455.91 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:04:24,934 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-28 23:05:04,343 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05749, simple_loss=0.05049, pruned_loss=0.005328, audio_tagging_loss=0.02692, over 4681554.00 frames. 2023-11-28 23:05:04,344 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-28 23:05:06,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3707340.0, ans=12.0 2023-11-28 23:05:13,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3707340.0, ans=0.125 2023-11-28 23:05:20,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.232e+01 9.628e+01 1.042e+02 1.260e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 23:05:20,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3707406.6666666665, ans=0.2 2023-11-28 23:05:37,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3707473.3333333335, ans=0.2 2023-11-28 23:05:45,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-11-28 23:05:54,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3707606.6666666665, ans=0.0 2023-11-28 23:06:02,529 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-28 23:06:04,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3707673.3333333335, ans=0.125 2023-11-28 23:06:05,936 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3050, loss[loss=0.09539, simple_loss=0.124, pruned_loss=0.02721, audio_tagging_loss=0.006166, over 14820.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08929, pruned_loss=0.01185, audio_tagging_loss=0.00863, over 3061307.39 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:06:11,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3707673.3333333335, ans=0.125 2023-11-28 23:06:21,539 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:06:44,909 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:06:54,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3707940.0, ans=0.2 2023-11-28 23:07:03,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3707940.0, ans=0.125 2023-11-28 23:07:03,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3707940.0, ans=0.125 2023-11-28 23:07:04,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-28 23:07:08,264 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3100, loss[loss=0.05605, simple_loss=0.07231, pruned_loss=0.008919, audio_tagging_loss=0.01097, over 14023.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08951, pruned_loss=0.01195, audio_tagging_loss=0.008647, over 3055272.26 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:07:23,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.064e+01 9.672e+01 1.048e+02 1.274e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 23:07:41,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3708140.0, ans=0.0 2023-11-28 23:08:05,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-28 23:08:08,490 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3150, loss[loss=0.07562, simple_loss=0.1061, pruned_loss=0.01428, audio_tagging_loss=0.008315, over 15449.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08967, pruned_loss=0.01183, audio_tagging_loss=0.008689, over 3059144.08 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:08:42,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2023-11-28 23:08:52,309 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:09:07,547 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-28 23:09:10,926 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3200, loss[loss=0.05344, simple_loss=0.07231, pruned_loss=0.006529, audio_tagging_loss=0.01076, over 15532.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09012, pruned_loss=0.01204, audio_tagging_loss=0.008731, over 3055528.34 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:09:20,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3708673.3333333335, ans=0.0 2023-11-28 23:09:26,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.731e+01 9.590e+01 1.027e+02 1.409e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 23:09:36,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2023-11-28 23:09:36,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3708806.6666666665, ans=0.2 2023-11-28 23:09:48,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3708873.3333333335, ans=0.025 2023-11-28 23:10:09,028 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-28 23:10:12,489 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3250, loss[loss=0.04609, simple_loss=0.05579, pruned_loss=0.006472, audio_tagging_loss=0.01172, over 15860.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08893, pruned_loss=0.01184, audio_tagging_loss=0.008921, over 3057054.52 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:10:16,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3709006.6666666665, ans=0.125 2023-11-28 23:10:32,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3709073.3333333335, ans=0.035 2023-11-28 23:10:33,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2023-11-28 23:10:35,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3709140.0, ans=0.0 2023-11-28 23:10:36,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3709140.0, ans=0.04949747468305833 2023-11-28 23:10:54,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3709206.6666666665, ans=0.0 2023-11-28 23:11:04,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3709273.3333333335, ans=0.0 2023-11-28 23:11:04,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3709273.3333333335, ans=0.0 2023-11-28 23:11:07,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3709273.3333333335, ans=0.125 2023-11-28 23:11:10,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-28 23:11:14,523 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3300, loss[loss=0.07791, simple_loss=0.1213, pruned_loss=0.01206, audio_tagging_loss=0.005196, over 14952.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08856, pruned_loss=0.01181, audio_tagging_loss=0.008978, over 3053387.98 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:11:18,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3709340.0, ans=0.125 2023-11-28 23:11:26,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-28 23:11:31,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.994e+01 9.101e+01 9.601e+01 1.014e+02 1.380e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 23:11:34,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-11-28 23:11:37,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-28 23:11:51,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3709540.0, ans=0.0 2023-11-28 23:11:51,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3709540.0, ans=0.0 2023-11-28 23:11:54,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3709540.0, ans=0.0 2023-11-28 23:12:04,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3709606.6666666665, ans=0.0 2023-11-28 23:12:12,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-28 23:12:12,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3709606.6666666665, ans=0.125 2023-11-28 23:12:16,465 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3350, loss[loss=0.05461, simple_loss=0.07368, pruned_loss=0.01064, audio_tagging_loss=0.007127, over 14608.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08837, pruned_loss=0.01189, audio_tagging_loss=0.00887, over 3060142.89 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:12:21,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3709673.3333333335, ans=0.2 2023-11-28 23:12:25,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2023-11-28 23:12:29,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-28 23:12:47,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3709806.6666666665, ans=0.2 2023-11-28 23:13:08,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3709940.0, ans=0.025 2023-11-28 23:13:14,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-28 23:13:14,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3709940.0, ans=0.035 2023-11-28 23:13:14,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3709940.0, ans=0.0 2023-11-28 23:13:18,067 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3400, loss[loss=0.06421, simple_loss=0.09688, pruned_loss=0.0111, audio_tagging_loss=0.004663, over 14934.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08885, pruned_loss=0.01187, audio_tagging_loss=0.008769, over 3062895.03 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:13:33,976 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.795e+01 9.500e+01 1.053e+02 1.456e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 23:13:40,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-28 23:13:42,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=12.0 2023-11-28 23:13:57,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3710206.6666666665, ans=0.125 2023-11-28 23:14:01,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3710206.6666666665, ans=0.2 2023-11-28 23:14:03,053 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:14:07,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710273.3333333335, ans=0.1 2023-11-28 23:14:16,445 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-28 23:14:19,889 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3450, loss[loss=0.07163, simple_loss=0.09702, pruned_loss=0.01408, audio_tagging_loss=0.009037, over 14792.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08966, pruned_loss=0.01198, audio_tagging_loss=0.008664, over 3058756.74 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:14:23,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3710340.0, ans=0.125 2023-11-28 23:14:23,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3710340.0, ans=0.0 2023-11-28 23:14:34,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-28 23:14:45,085 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:14:53,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3710473.3333333335, ans=0.0 2023-11-28 23:14:54,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-28 23:14:54,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3710473.3333333335, ans=0.125 2023-11-28 23:14:57,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=22.5 2023-11-28 23:14:57,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3710540.0, ans=0.1 2023-11-28 23:15:05,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-28 23:15:09,567 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:15:17,523 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-28 23:15:21,979 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3500, loss[loss=0.06421, simple_loss=0.09495, pruned_loss=0.009572, audio_tagging_loss=0.00716, over 15269.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08961, pruned_loss=0.01194, audio_tagging_loss=0.008517, over 3063367.40 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:15:23,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3710673.3333333335, ans=0.0 2023-11-28 23:15:27,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-28 23:15:33,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3710740.0, ans=0.1 2023-11-28 23:15:35,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3710740.0, ans=0.125 2023-11-28 23:15:38,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.535e+01 1.020e+02 1.277e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 23:15:48,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3710806.6666666665, ans=0.0 2023-11-28 23:15:53,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3710806.6666666665, ans=0.0 2023-11-28 23:15:56,579 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:16:18,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-28 23:16:20,360 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-28 23:16:24,398 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3550, loss[loss=0.07103, simple_loss=0.1049, pruned_loss=0.01337, audio_tagging_loss=0.005242, over 15962.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08868, pruned_loss=0.01185, audio_tagging_loss=0.008493, over 3059252.95 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:16:37,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3711073.3333333335, ans=0.0 2023-11-28 23:16:45,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3711073.3333333335, ans=0.2 2023-11-28 23:16:56,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3711140.0, ans=0.0 2023-11-28 23:16:57,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3711140.0, ans=0.0 2023-11-28 23:17:11,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3711206.6666666665, ans=0.05 2023-11-28 23:17:23,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-28 23:17:26,709 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3600, loss[loss=0.05641, simple_loss=0.07726, pruned_loss=0.008816, audio_tagging_loss=0.008959, over 14944.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08823, pruned_loss=0.01179, audio_tagging_loss=0.008466, over 3059299.26 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:17:35,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=8.0 2023-11-28 23:17:42,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.750e+01 9.399e+01 1.010e+02 1.318e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 23:17:46,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3711406.6666666665, ans=0.125 2023-11-28 23:18:05,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3711540.0, ans=0.2 2023-11-28 23:18:15,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3711606.6666666665, ans=0.125 2023-11-28 23:18:23,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-28 23:18:23,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3711606.6666666665, ans=0.0 2023-11-28 23:18:27,657 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3650, loss[loss=0.07336, simple_loss=0.09766, pruned_loss=0.01666, audio_tagging_loss=0.007866, over 15146.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08839, pruned_loss=0.01174, audio_tagging_loss=0.008439, over 3048451.95 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:18:46,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3711740.0, ans=0.125 2023-11-28 23:19:00,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3711806.6666666665, ans=0.0 2023-11-28 23:19:06,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3711873.3333333335, ans=10.0 2023-11-28 23:19:25,323 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-28 23:19:29,792 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3700, loss[loss=0.06305, simple_loss=0.08731, pruned_loss=0.01092, audio_tagging_loss=0.008479, over 15925.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08852, pruned_loss=0.01177, audio_tagging_loss=0.008426, over 3053775.99 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:19:40,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3712006.6666666665, ans=0.125 2023-11-28 23:19:47,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.931e+01 9.622e+01 1.040e+02 1.365e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-28 23:19:55,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3712140.0, ans=0.0 2023-11-28 23:19:55,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3712140.0, ans=0.07 2023-11-28 23:20:28,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-28 23:20:32,146 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3750, loss[loss=0.06661, simple_loss=0.09577, pruned_loss=0.0106, audio_tagging_loss=0.00813, over 16491.00 frames. ], tot_loss[loss=0.06414, simple_loss=0.08804, pruned_loss=0.01163, audio_tagging_loss=0.008488, over 3052245.29 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:20:35,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3712340.0, ans=0.125 2023-11-28 23:20:42,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3712406.6666666665, ans=0.0 2023-11-28 23:20:44,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3712406.6666666665, ans=0.125 2023-11-28 23:20:56,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712473.3333333335, ans=0.1 2023-11-28 23:21:01,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-28 23:21:02,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3712473.3333333335, ans=0.1 2023-11-28 23:21:09,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3712540.0, ans=0.0 2023-11-28 23:21:16,888 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:21:22,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3712606.6666666665, ans=0.0 2023-11-28 23:21:30,042 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-28 23:21:33,584 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3800, loss[loss=0.05165, simple_loss=0.07059, pruned_loss=0.006104, audio_tagging_loss=0.01025, over 15286.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08897, pruned_loss=0.01199, audio_tagging_loss=0.00854, over 3053783.74 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:21:52,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.439e+01 1.001e+02 1.076e+02 2.686e+02, threshold=2.002e+02, percent-clipped=1.0 2023-11-28 23:21:59,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3712806.6666666665, ans=0.125 2023-11-28 23:22:13,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3712873.3333333335, ans=0.2 2023-11-28 23:22:19,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712873.3333333335, ans=0.1 2023-11-28 23:22:31,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-28 23:22:35,416 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3850, loss[loss=0.07241, simple_loss=0.1006, pruned_loss=0.01537, audio_tagging_loss=0.006721, over 14496.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08957, pruned_loss=0.012, audio_tagging_loss=0.00862, over 3054513.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:22:44,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3713006.6666666665, ans=0.125 2023-11-28 23:22:50,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3713073.3333333335, ans=0.125 2023-11-28 23:22:51,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3713073.3333333335, ans=0.0 2023-11-28 23:23:09,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3713140.0, ans=0.125 2023-11-28 23:23:28,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713273.3333333335, ans=0.1 2023-11-28 23:23:34,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-28 23:23:38,033 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3900, loss[loss=0.07573, simple_loss=0.1044, pruned_loss=0.01452, audio_tagging_loss=0.009027, over 14910.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08911, pruned_loss=0.01181, audio_tagging_loss=0.008702, over 3046682.50 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:23:42,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3713340.0, ans=0.125 2023-11-28 23:23:50,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2023-11-28 23:23:55,794 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.938e+01 9.522e+01 1.035e+02 1.409e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:23:58,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-11-28 23:24:04,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3713473.3333333335, ans=0.2 2023-11-28 23:24:04,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3713473.3333333335, ans=0.07 2023-11-28 23:24:28,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3713606.6666666665, ans=0.0 2023-11-28 23:24:29,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-28 23:24:32,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3713606.6666666665, ans=0.125 2023-11-28 23:24:34,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-28 23:24:38,300 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3950, loss[loss=0.07768, simple_loss=0.1056, pruned_loss=0.0166, audio_tagging_loss=0.008287, over 15977.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08842, pruned_loss=0.01172, audio_tagging_loss=0.008852, over 3054438.53 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:25:06,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3713806.6666666665, ans=0.125 2023-11-28 23:25:33,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-28 23:25:37,815 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-28 23:25:41,349 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4000, loss[loss=0.05337, simple_loss=0.07526, pruned_loss=0.008896, audio_tagging_loss=0.006842, over 16141.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08881, pruned_loss=0.01176, audio_tagging_loss=0.008841, over 3052929.30 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:25:59,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.920e+01 9.493e+01 1.035e+02 1.641e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 23:26:17,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3714206.6666666665, ans=0.0 2023-11-28 23:26:27,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3714206.6666666665, ans=15.0 2023-11-28 23:26:28,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3714206.6666666665, ans=0.125 2023-11-28 23:26:38,952 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-28 23:26:42,999 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4050, loss[loss=0.07084, simple_loss=0.09472, pruned_loss=0.01486, audio_tagging_loss=0.008622, over 14690.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08971, pruned_loss=0.01208, audio_tagging_loss=0.00885, over 3051437.19 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:26:47,666 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:26:50,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:27:32,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3714606.6666666665, ans=0.125 2023-11-28 23:27:41,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-28 23:27:42,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-28 23:27:45,245 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4100, loss[loss=0.07427, simple_loss=0.1036, pruned_loss=0.01314, audio_tagging_loss=0.009327, over 15629.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09019, pruned_loss=0.01206, audio_tagging_loss=0.008826, over 3051931.73 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:27:48,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-11-28 23:27:48,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3714673.3333333335, ans=0.0 2023-11-28 23:27:51,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-28 23:27:59,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-28 23:28:03,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 9.138e+01 9.541e+01 1.028e+02 1.498e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 23:28:30,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3714873.3333333335, ans=0.2 2023-11-28 23:28:34,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3714940.0, ans=0.0 2023-11-28 23:28:43,449 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-28 23:28:45,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-28 23:28:46,810 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4150, loss[loss=0.0741, simple_loss=0.09809, pruned_loss=0.01758, audio_tagging_loss=0.007479, over 14981.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08967, pruned_loss=0.01204, audio_tagging_loss=0.008641, over 3050076.79 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:28:53,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3715006.6666666665, ans=0.0 2023-11-28 23:29:09,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=22.5 2023-11-28 23:29:14,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3715140.0, ans=0.125 2023-11-28 23:29:32,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-28 23:29:33,551 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:29:44,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-28 23:29:48,184 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4200, loss[loss=0.05319, simple_loss=0.07979, pruned_loss=0.007111, audio_tagging_loss=0.006186, over 14600.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08961, pruned_loss=0.01196, audio_tagging_loss=0.008544, over 3048365.40 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:29:51,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2023-11-28 23:29:55,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3715340.0, ans=0.125 2023-11-28 23:30:06,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.859e+01 9.416e+01 1.036e+02 1.524e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 23:30:13,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-11-28 23:30:27,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3715540.0, ans=0.0 2023-11-28 23:30:29,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-28 23:30:39,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3715606.6666666665, ans=0.0 2023-11-28 23:30:46,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-28 23:30:49,967 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4250, loss[loss=0.05174, simple_loss=0.06499, pruned_loss=0.00835, audio_tagging_loss=0.01089, over 15122.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09063, pruned_loss=0.01207, audio_tagging_loss=0.008461, over 3048282.38 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:31:04,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3715740.0, ans=0.125 2023-11-28 23:31:17,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3715806.6666666665, ans=0.2 2023-11-28 23:31:18,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3715806.6666666665, ans=0.0 2023-11-28 23:31:30,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715873.3333333335, ans=0.1 2023-11-28 23:31:46,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3715940.0, ans=0.125 2023-11-28 23:31:47,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-28 23:31:51,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2023-11-28 23:31:51,608 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4300, loss[loss=0.0569, simple_loss=0.06981, pruned_loss=0.01192, audio_tagging_loss=0.01008, over 17248.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09071, pruned_loss=0.01209, audio_tagging_loss=0.00841, over 3047646.76 frames. ], batch size: 65, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:31:56,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3716006.6666666665, ans=0.125 2023-11-28 23:32:09,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 9.110e+01 9.607e+01 1.023e+02 1.243e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 23:32:11,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3716073.3333333335, ans=0.07 2023-11-28 23:32:16,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3716140.0, ans=0.125 2023-11-28 23:32:45,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-28 23:32:46,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3716273.3333333335, ans=0.2 2023-11-28 23:32:49,424 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-28 23:32:53,493 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4350, loss[loss=0.07118, simple_loss=0.1041, pruned_loss=0.01297, audio_tagging_loss=0.006164, over 15321.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09064, pruned_loss=0.01201, audio_tagging_loss=0.008354, over 3049144.59 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:33:03,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-28 23:33:32,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3716540.0, ans=0.125 2023-11-28 23:33:48,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.38 vs. limit=10.0 2023-11-28 23:33:52,132 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-28 23:33:53,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3716606.6666666665, ans=0.125 2023-11-28 23:33:55,544 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4400, loss[loss=0.07814, simple_loss=0.1066, pruned_loss=0.01803, audio_tagging_loss=0.006803, over 14998.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.09009, pruned_loss=0.0119, audio_tagging_loss=0.008403, over 3047828.81 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:34:02,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3716673.3333333335, ans=0.0 2023-11-28 23:34:08,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3716740.0, ans=0.125 2023-11-28 23:34:09,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3716740.0, ans=0.035 2023-11-28 23:34:14,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3716740.0, ans=0.0 2023-11-28 23:34:15,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.001e+01 9.645e+01 1.064e+02 1.630e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 23:34:30,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3716806.6666666665, ans=0.2 2023-11-28 23:34:53,966 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-28 23:34:57,368 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4450, loss[loss=0.06956, simple_loss=0.1034, pruned_loss=0.01128, audio_tagging_loss=0.006605, over 15453.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09097, pruned_loss=0.01192, audio_tagging_loss=0.00834, over 3052387.16 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:35:02,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3717006.6666666665, ans=0.1 2023-11-28 23:35:22,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2023-11-28 23:35:53,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-28 23:35:53,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-28 23:35:55,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-28 23:36:00,213 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4500, loss[loss=0.08446, simple_loss=0.1145, pruned_loss=0.01973, audio_tagging_loss=0.007461, over 15439.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09099, pruned_loss=0.01195, audio_tagging_loss=0.00832, over 3050930.35 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:36:15,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3717406.6666666665, ans=0.0 2023-11-28 23:36:19,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.979e+01 9.760e+01 1.042e+02 1.445e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 23:36:21,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3717406.6666666665, ans=0.1 2023-11-28 23:36:35,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3717473.3333333335, ans=0.125 2023-11-28 23:36:58,518 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-28 23:36:58,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3717606.6666666665, ans=0.125 2023-11-28 23:37:01,989 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4550, loss[loss=0.05492, simple_loss=0.07717, pruned_loss=0.008176, audio_tagging_loss=0.008153, over 15782.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.09018, pruned_loss=0.01187, audio_tagging_loss=0.008341, over 3052796.41 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:37:13,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-28 23:37:30,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3717806.6666666665, ans=0.2 2023-11-28 23:37:30,096 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:37:32,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3717806.6666666665, ans=0.0 2023-11-28 23:37:38,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3717873.3333333335, ans=0.125 2023-11-28 23:37:48,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3717873.3333333335, ans=0.0 2023-11-28 23:37:50,811 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:37:59,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-28 23:37:59,275 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:38:02,398 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4600, loss[loss=0.06262, simple_loss=0.08467, pruned_loss=0.01202, audio_tagging_loss=0.008254, over 14982.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08942, pruned_loss=0.0119, audio_tagging_loss=0.008482, over 3056081.86 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:38:22,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 9.011e+01 9.487e+01 1.017e+02 1.254e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 23:38:36,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3718140.0, ans=0.2 2023-11-28 23:38:39,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3718206.6666666665, ans=0.125 2023-11-28 23:38:45,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2023-11-28 23:39:01,155 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-28 23:39:04,627 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4650, loss[loss=0.08536, simple_loss=0.1145, pruned_loss=0.02005, audio_tagging_loss=0.008067, over 14859.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08774, pruned_loss=0.01178, audio_tagging_loss=0.008607, over 3050625.17 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:39:06,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3718340.0, ans=0.125 2023-11-28 23:39:41,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3718540.0, ans=0.025 2023-11-28 23:39:52,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3718540.0, ans=0.1 2023-11-28 23:40:03,743 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-28 23:40:07,624 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4700, loss[loss=0.05423, simple_loss=0.07424, pruned_loss=0.01024, audio_tagging_loss=0.006874, over 16506.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08916, pruned_loss=0.01209, audio_tagging_loss=0.008586, over 3050187.74 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:40:11,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3718673.3333333335, ans=0.0 2023-11-28 23:40:15,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-28 23:40:17,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3718673.3333333335, ans=0.125 2023-11-28 23:40:25,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.230e+01 9.778e+01 1.067e+02 1.457e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:40:29,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3718740.0, ans=0.05 2023-11-28 23:40:32,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-11-28 23:40:39,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3718806.6666666665, ans=0.125 2023-11-28 23:40:48,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3718873.3333333335, ans=0.125 2023-11-28 23:41:04,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-28 23:41:08,267 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4750, loss[loss=0.08827, simple_loss=0.1298, pruned_loss=0.01667, audio_tagging_loss=0.006704, over 17189.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.0881, pruned_loss=0.01189, audio_tagging_loss=0.00875, over 3047393.96 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:41:15,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3719006.6666666665, ans=0.2 2023-11-28 23:41:20,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3719073.3333333335, ans=0.125 2023-11-28 23:41:34,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3719140.0, ans=0.125 2023-11-28 23:41:37,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3719140.0, ans=0.125 2023-11-28 23:41:59,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3719273.3333333335, ans=0.125 2023-11-28 23:42:06,278 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-28 23:42:10,513 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4800, loss[loss=0.06762, simple_loss=0.09766, pruned_loss=0.01022, audio_tagging_loss=0.008568, over 14738.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08847, pruned_loss=0.01176, audio_tagging_loss=0.008781, over 3051912.96 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:42:29,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3719406.6666666665, ans=0.125 2023-11-28 23:42:29,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3719406.6666666665, ans=0.125 2023-11-28 23:42:30,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 9.011e+01 9.522e+01 1.013e+02 1.336e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:42:30,577 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:42:51,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3719540.0, ans=0.125 2023-11-28 23:42:56,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3719540.0, ans=0.125 2023-11-28 23:43:09,191 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-28 23:43:10,501 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:43:12,627 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4850, loss[loss=0.05888, simple_loss=0.07623, pruned_loss=0.01072, audio_tagging_loss=0.01005, over 15829.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0893, pruned_loss=0.01194, audio_tagging_loss=0.008825, over 3049276.96 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:43:30,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-11-28 23:43:31,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-28 23:43:44,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3719806.6666666665, ans=0.1 2023-11-28 23:43:51,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3719873.3333333335, ans=0.125 2023-11-28 23:43:52,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3719873.3333333335, ans=0.125 2023-11-28 23:43:57,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-28 23:44:10,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-28 23:44:14,549 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4900, loss[loss=0.07576, simple_loss=0.1023, pruned_loss=0.01569, audio_tagging_loss=0.008901, over 15095.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08947, pruned_loss=0.01194, audio_tagging_loss=0.00876, over 3045848.73 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:44:24,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=3720006.6666666665, ans=22.5 2023-11-28 23:44:27,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3720073.3333333335, ans=0.125 2023-11-28 23:44:35,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.818e+01 9.390e+01 1.021e+02 1.310e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 23:45:11,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3720273.3333333335, ans=0.125 2023-11-28 23:45:12,750 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-28 23:45:16,102 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4950, loss[loss=0.08, simple_loss=0.1109, pruned_loss=0.01668, audio_tagging_loss=0.007849, over 15905.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08956, pruned_loss=0.01202, audio_tagging_loss=0.00858, over 3043727.45 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:45:22,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3720340.0, ans=0.125 2023-11-28 23:45:44,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3720473.3333333335, ans=0.125 2023-11-28 23:46:00,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3720540.0, ans=0.125 2023-11-28 23:46:03,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3720606.6666666665, ans=0.0 2023-11-28 23:46:14,344 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-28 23:46:18,274 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5000, loss[loss=0.06623, simple_loss=0.09342, pruned_loss=0.01142, audio_tagging_loss=0.008106, over 15416.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08975, pruned_loss=0.01201, audio_tagging_loss=0.008436, over 3042967.57 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:46:20,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-11-28 23:46:20,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:46:20,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3720673.3333333335, ans=0.2 2023-11-28 23:46:38,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.963e+01 9.566e+01 1.007e+02 2.358e+02, threshold=1.913e+02, percent-clipped=1.0 2023-11-28 23:46:38,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3720740.0, ans=0.125 2023-11-28 23:47:00,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3720873.3333333335, ans=0.5 2023-11-28 23:47:04,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3720873.3333333335, ans=10.0 2023-11-28 23:47:15,723 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-28 23:47:16,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2023-11-28 23:47:19,185 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5050, loss[loss=0.05807, simple_loss=0.07412, pruned_loss=0.01322, audio_tagging_loss=0.007788, over 14832.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08947, pruned_loss=0.01189, audio_tagging_loss=0.008383, over 3037113.63 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:47:29,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3721073.3333333335, ans=0.0 2023-11-28 23:48:01,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3721206.6666666665, ans=0.125 2023-11-28 23:48:11,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3721273.3333333335, ans=0.2 2023-11-28 23:48:13,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3721273.3333333335, ans=0.125 2023-11-28 23:48:16,731 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-28 23:48:21,056 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5100, loss[loss=0.08167, simple_loss=0.1178, pruned_loss=0.0174, audio_tagging_loss=0.005385, over 15553.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08976, pruned_loss=0.01196, audio_tagging_loss=0.008326, over 3047327.38 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:48:24,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3721340.0, ans=0.125 2023-11-28 23:48:44,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.897e+01 9.648e+01 1.044e+02 1.353e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 23:48:46,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:48:47,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:48:53,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:49:18,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-28 23:49:21,927 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5150, loss[loss=0.09057, simple_loss=0.13, pruned_loss=0.0197, audio_tagging_loss=0.005865, over 15481.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.09, pruned_loss=0.01192, audio_tagging_loss=0.008339, over 3039753.20 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:49:27,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3721673.3333333335, ans=0.2 2023-11-28 23:49:41,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3721740.0, ans=0.125 2023-11-28 23:49:51,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-28 23:50:21,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-28 23:50:25,153 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5200, loss[loss=0.06638, simple_loss=0.09182, pruned_loss=0.01136, audio_tagging_loss=0.009112, over 14981.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.09027, pruned_loss=0.01199, audio_tagging_loss=0.008283, over 3035168.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:50:46,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.041e+01 9.653e+01 1.034e+02 1.419e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 23:50:57,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3722140.0, ans=0.125 2023-11-28 23:51:22,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-28 23:51:26,646 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5250, loss[loss=0.07042, simple_loss=0.1091, pruned_loss=0.008231, audio_tagging_loss=0.007662, over 15439.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09081, pruned_loss=0.01203, audio_tagging_loss=0.008303, over 3039277.32 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:51:34,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3722340.0, ans=0.2 2023-11-28 23:52:06,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3722540.0, ans=0.1 2023-11-28 23:52:14,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-28 23:52:20,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3722606.6666666665, ans=0.125 2023-11-28 23:52:23,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-28 23:52:24,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-28 23:52:25,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-28 23:52:28,341 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5300, loss[loss=0.06917, simple_loss=0.09348, pruned_loss=0.01418, audio_tagging_loss=0.008254, over 13815.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0903, pruned_loss=0.01206, audio_tagging_loss=0.008387, over 3040498.74 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:52:48,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3722740.0, ans=0.2 2023-11-28 23:52:50,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.120e+01 9.836e+01 1.047e+02 1.238e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 23:52:51,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-28 23:53:05,253 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:53:06,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3722873.3333333335, ans=0.0 2023-11-28 23:53:15,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3722873.3333333335, ans=0.5 2023-11-28 23:53:16,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3722940.0, ans=0.0 2023-11-28 23:53:26,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-28 23:53:30,160 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5350, loss[loss=0.06191, simple_loss=0.07875, pruned_loss=0.01243, audio_tagging_loss=0.0101, over 14996.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08998, pruned_loss=0.01189, audio_tagging_loss=0.008474, over 3041144.55 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:53:39,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3723006.6666666665, ans=0.125 2023-11-28 23:53:58,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-28 23:54:05,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-11-28 23:54:25,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3723273.3333333335, ans=0.125 2023-11-28 23:54:27,993 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-28 23:54:30,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3723340.0, ans=6.0 2023-11-28 23:54:31,509 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5400, loss[loss=0.08283, simple_loss=0.1076, pruned_loss=0.02061, audio_tagging_loss=0.00844, over 14726.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09023, pruned_loss=0.0121, audio_tagging_loss=0.008481, over 3036415.24 frames. ], batch size: 52, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:54:31,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3723340.0, ans=0.0 2023-11-28 23:54:37,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-11-28 23:54:54,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.310e+01 8.983e+01 9.673e+01 1.019e+02 1.246e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 23:55:21,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3723606.6666666665, ans=0.1 2023-11-28 23:55:29,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723606.6666666665, ans=0.1 2023-11-28 23:55:29,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-28 23:55:32,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-28 23:55:33,327 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5450, loss[loss=0.05731, simple_loss=0.07676, pruned_loss=0.009575, audio_tagging_loss=0.009357, over 15420.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08991, pruned_loss=0.01197, audio_tagging_loss=0.008553, over 3037074.87 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:55:33,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3723673.3333333335, ans=0.125 2023-11-28 23:55:38,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3723673.3333333335, ans=0.0 2023-11-28 23:55:46,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723740.0, ans=0.1 2023-11-28 23:56:08,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-28 23:56:13,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3723873.3333333335, ans=0.0 2023-11-28 23:56:13,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3723873.3333333335, ans=0.2 2023-11-28 23:56:19,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3723873.3333333335, ans=0.0 2023-11-28 23:56:20,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3723873.3333333335, ans=0.125 2023-11-28 23:56:26,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3723940.0, ans=0.1 2023-11-28 23:56:31,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-28 23:56:35,658 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5500, loss[loss=0.05987, simple_loss=0.08819, pruned_loss=0.007838, audio_tagging_loss=0.007942, over 14685.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08982, pruned_loss=0.01199, audio_tagging_loss=0.008578, over 3036654.10 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:56:36,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-28 23:56:49,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2023-11-28 23:56:57,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.978e+01 9.679e+01 1.033e+02 1.249e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 23:57:06,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-28 23:57:15,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3724206.6666666665, ans=0.125 2023-11-28 23:57:16,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3724206.6666666665, ans=0.125 2023-11-28 23:57:33,457 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-28 23:57:36,770 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5550, loss[loss=0.05925, simple_loss=0.07202, pruned_loss=0.01042, audio_tagging_loss=0.01281, over 14917.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08963, pruned_loss=0.01208, audio_tagging_loss=0.008722, over 3043449.12 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:57:38,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3724340.0, ans=0.125 2023-11-28 23:57:43,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3724340.0, ans=0.04949747468305833 2023-11-28 23:57:52,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2023-11-28 23:58:10,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3724473.3333333335, ans=0.1 2023-11-28 23:58:12,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3724540.0, ans=0.1 2023-11-28 23:58:35,092 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-28 23:58:35,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-28 23:58:38,526 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5600, loss[loss=0.07405, simple_loss=0.1015, pruned_loss=0.01611, audio_tagging_loss=0.007166, over 15563.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09055, pruned_loss=0.01213, audio_tagging_loss=0.008809, over 3047829.23 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:59:00,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.219e+01 9.778e+01 1.037e+02 1.295e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:59:08,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3724806.6666666665, ans=0.125 2023-11-28 23:59:14,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724873.3333333335, ans=0.1 2023-11-28 23:59:19,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3724873.3333333335, ans=0.1 2023-11-28 23:59:24,209 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:59:30,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2023-11-28 23:59:37,046 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-28 23:59:40,486 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5650, loss[loss=0.08562, simple_loss=0.1115, pruned_loss=0.01937, audio_tagging_loss=0.01052, over 14521.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09001, pruned_loss=0.0121, audio_tagging_loss=0.008946, over 3050679.62 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:59:54,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3725073.3333333335, ans=0.0 2023-11-29 00:00:00,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3725073.3333333335, ans=0.125 2023-11-29 00:00:21,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-29 00:00:31,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3725273.3333333335, ans=0.0 2023-11-29 00:00:37,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-29 00:00:37,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-29 00:00:41,892 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5700, loss[loss=0.05616, simple_loss=0.07719, pruned_loss=0.01001, audio_tagging_loss=0.007552, over 15090.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09004, pruned_loss=0.01208, audio_tagging_loss=0.008989, over 3047831.79 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:01:04,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.841e+01 9.405e+01 1.014e+02 1.366e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-29 00:01:29,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3725540.0, ans=0.1 2023-11-29 00:01:41,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-29 00:01:44,642 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5750, loss[loss=0.08335, simple_loss=0.1143, pruned_loss=0.01852, audio_tagging_loss=0.007675, over 15157.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08915, pruned_loss=0.01192, audio_tagging_loss=0.008786, over 3043070.44 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:01:51,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3725673.3333333335, ans=15.0 2023-11-29 00:01:52,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3725673.3333333335, ans=0.125 2023-11-29 00:01:54,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3725673.3333333335, ans=0.0 2023-11-29 00:01:58,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3725740.0, ans=0.0 2023-11-29 00:02:01,142 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:02:10,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3725806.6666666665, ans=0.0 2023-11-29 00:02:10,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3725806.6666666665, ans=0.0 2023-11-29 00:02:26,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3725873.3333333335, ans=0.0 2023-11-29 00:02:30,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3725873.3333333335, ans=0.2 2023-11-29 00:02:30,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-29 00:02:42,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-29 00:02:46,200 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5800, loss[loss=0.05616, simple_loss=0.08114, pruned_loss=0.006289, audio_tagging_loss=0.009305, over 15303.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08917, pruned_loss=0.01189, audio_tagging_loss=0.008609, over 3047608.38 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:02:56,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-29 00:03:04,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3726073.3333333335, ans=0.125 2023-11-29 00:03:06,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3726073.3333333335, ans=0.125 2023-11-29 00:03:08,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 8.867e+01 9.470e+01 1.000e+02 1.681e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-29 00:03:13,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3726140.0, ans=0.125 2023-11-29 00:03:37,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3726273.3333333335, ans=0.0 2023-11-29 00:03:43,103 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-29 00:03:46,486 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5850, loss[loss=0.07867, simple_loss=0.115, pruned_loss=0.01655, audio_tagging_loss=0.004611, over 15186.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08911, pruned_loss=0.01191, audio_tagging_loss=0.008618, over 3052166.32 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:04:20,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3726473.3333333335, ans=0.0 2023-11-29 00:04:33,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3726540.0, ans=0.5 2023-11-29 00:04:44,577 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-29 00:04:45,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3726606.6666666665, ans=0.2 2023-11-29 00:04:49,138 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5900, loss[loss=0.07188, simple_loss=0.101, pruned_loss=0.0155, audio_tagging_loss=0.005853, over 14216.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09002, pruned_loss=0.01212, audio_tagging_loss=0.008534, over 3050153.27 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:04:50,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3726673.3333333335, ans=0.125 2023-11-29 00:04:54,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3726673.3333333335, ans=0.125 2023-11-29 00:05:12,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.979e+01 9.571e+01 1.024e+02 1.288e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 00:05:47,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-29 00:05:49,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3726940.0, ans=0.125 2023-11-29 00:05:51,240 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5950, loss[loss=0.07439, simple_loss=0.1047, pruned_loss=0.01454, audio_tagging_loss=0.007509, over 15996.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09107, pruned_loss=0.01236, audio_tagging_loss=0.008463, over 3056265.06 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:05:56,042 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:06:07,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3727073.3333333335, ans=0.5 2023-11-29 00:06:07,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3727073.3333333335, ans=0.125 2023-11-29 00:06:18,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3727140.0, ans=0.1 2023-11-29 00:06:48,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-29 00:06:51,706 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6000, loss[loss=0.04894, simple_loss=0.0595, pruned_loss=0.008301, audio_tagging_loss=0.01089, over 14578.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09043, pruned_loss=0.01222, audio_tagging_loss=0.008429, over 3050262.66 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:06:51,707 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 00:07:10,208 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0661, 3.7860, 3.8694, 3.4465, 4.2177, 4.2492, 4.3979, 4.2956], device='cuda:3') 2023-11-29 00:07:31,869 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05752, simple_loss=0.05049, pruned_loss=0.005333, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-29 00:07:31,869 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 00:07:56,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 9.062e+01 9.671e+01 1.050e+02 2.392e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 00:08:13,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3727540.0, ans=0.0 2023-11-29 00:08:17,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=12.0 2023-11-29 00:08:17,529 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:08:31,001 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-29 00:08:34,407 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6050, loss[loss=0.07703, simple_loss=0.1063, pruned_loss=0.01284, audio_tagging_loss=0.01105, over 15531.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09089, pruned_loss=0.01229, audio_tagging_loss=0.008433, over 3055776.64 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:09:02,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-29 00:09:31,106 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-29 00:09:34,931 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6100, loss[loss=0.08341, simple_loss=0.1199, pruned_loss=0.01523, audio_tagging_loss=0.008231, over 15402.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09061, pruned_loss=0.01223, audio_tagging_loss=0.008448, over 3046749.27 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:09:36,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3728006.6666666665, ans=0.05 2023-11-29 00:09:37,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3728006.6666666665, ans=0.0 2023-11-29 00:09:52,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-29 00:09:55,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-29 00:09:58,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.989e+01 9.555e+01 1.035e+02 1.326e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 00:10:04,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3728140.0, ans=0.125 2023-11-29 00:10:04,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3728140.0, ans=0.035 2023-11-29 00:10:31,606 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-29 00:10:35,632 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6150, loss[loss=0.06564, simple_loss=0.08488, pruned_loss=0.01085, audio_tagging_loss=0.01235, over 15284.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09071, pruned_loss=0.01219, audio_tagging_loss=0.008532, over 3047609.71 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:10:52,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3728406.6666666665, ans=0.125 2023-11-29 00:11:27,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.47 vs. limit=15.0 2023-11-29 00:11:33,889 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-29 00:11:35,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3728606.6666666665, ans=0.125 2023-11-29 00:11:37,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-29 00:11:38,005 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6200, loss[loss=0.0937, simple_loss=0.126, pruned_loss=0.02544, audio_tagging_loss=0.005288, over 14738.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09029, pruned_loss=0.01218, audio_tagging_loss=0.008581, over 3050332.82 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:11:55,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3728740.0, ans=0.07 2023-11-29 00:12:01,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.926e+01 8.945e+01 9.631e+01 1.031e+02 1.323e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 00:12:01,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3728806.6666666665, ans=0.125 2023-11-29 00:12:35,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-29 00:12:39,043 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6250, loss[loss=0.06284, simple_loss=0.09302, pruned_loss=0.01095, audio_tagging_loss=0.005384, over 14325.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09038, pruned_loss=0.01218, audio_tagging_loss=0.008648, over 3052498.86 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:13:01,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3729073.3333333335, ans=0.0 2023-11-29 00:13:11,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3729140.0, ans=0.0 2023-11-29 00:13:21,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3729206.6666666665, ans=0.1 2023-11-29 00:13:21,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729206.6666666665, ans=0.1 2023-11-29 00:13:31,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3729273.3333333335, ans=0.1 2023-11-29 00:13:36,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-29 00:13:39,767 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6300, loss[loss=0.07133, simple_loss=0.101, pruned_loss=0.01098, audio_tagging_loss=0.009821, over 15352.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09062, pruned_loss=0.01217, audio_tagging_loss=0.008835, over 3053255.88 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:13:42,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3729340.0, ans=0.015 2023-11-29 00:14:06,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.910e+01 9.740e+01 1.040e+02 1.205e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 00:14:10,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-11-29 00:14:18,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3729540.0, ans=0.125 2023-11-29 00:14:23,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3729540.0, ans=0.125 2023-11-29 00:14:24,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3729540.0, ans=0.0 2023-11-29 00:14:28,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3729606.6666666665, ans=0.0 2023-11-29 00:14:28,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3729606.6666666665, ans=0.1 2023-11-29 00:14:37,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-11-29 00:14:39,779 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-29 00:14:43,935 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6350, loss[loss=0.08191, simple_loss=0.1095, pruned_loss=0.01927, audio_tagging_loss=0.0079, over 14828.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08984, pruned_loss=0.01205, audio_tagging_loss=0.008927, over 3055346.34 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:14:47,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3729673.3333333335, ans=22.5 2023-11-29 00:14:47,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.97 vs. limit=5.0 2023-11-29 00:15:19,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3729873.3333333335, ans=0.125 2023-11-29 00:15:21,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3729873.3333333335, ans=0.2 2023-11-29 00:15:21,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3729873.3333333335, ans=0.125 2023-11-29 00:15:29,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3729873.3333333335, ans=0.125 2023-11-29 00:15:30,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3729873.3333333335, ans=0.2 2023-11-29 00:15:33,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3729940.0, ans=0.05 2023-11-29 00:15:41,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-29 00:15:42,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-29 00:15:42,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3729940.0, ans=0.125 2023-11-29 00:15:45,774 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6400, loss[loss=0.06203, simple_loss=0.08415, pruned_loss=0.0112, audio_tagging_loss=0.008752, over 14461.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0895, pruned_loss=0.01207, audio_tagging_loss=0.008964, over 3049773.74 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:15:56,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3730073.3333333335, ans=0.125 2023-11-29 00:15:56,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3730073.3333333335, ans=0.125 2023-11-29 00:16:10,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.936e+01 9.646e+01 1.038e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 00:16:33,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3730206.6666666665, ans=0.125 2023-11-29 00:16:37,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2023-11-29 00:16:43,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-29 00:16:46,990 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6450, loss[loss=0.05347, simple_loss=0.06872, pruned_loss=0.01075, audio_tagging_loss=0.008359, over 14738.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0894, pruned_loss=0.01205, audio_tagging_loss=0.008941, over 3048689.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:16:56,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3730340.0, ans=0.125 2023-11-29 00:17:16,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3730473.3333333335, ans=0.2 2023-11-29 00:17:16,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-29 00:17:30,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3730540.0, ans=0.0 2023-11-29 00:17:32,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3730540.0, ans=0.125 2023-11-29 00:17:36,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3730606.6666666665, ans=0.1 2023-11-29 00:17:37,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3730606.6666666665, ans=0.1 2023-11-29 00:17:38,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3730606.6666666665, ans=0.2 2023-11-29 00:17:46,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-29 00:17:50,647 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6500, loss[loss=0.07549, simple_loss=0.1029, pruned_loss=0.01633, audio_tagging_loss=0.007685, over 15065.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08904, pruned_loss=0.01195, audio_tagging_loss=0.008845, over 3050840.49 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:17:54,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3730673.3333333335, ans=0.0 2023-11-29 00:17:59,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=22.5 2023-11-29 00:18:03,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3730740.0, ans=0.125 2023-11-29 00:18:16,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.149e+01 9.988e+01 1.072e+02 1.426e+02, threshold=1.998e+02, percent-clipped=0.0 2023-11-29 00:18:20,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3730806.6666666665, ans=10.0 2023-11-29 00:18:28,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-29 00:18:29,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3730873.3333333335, ans=0.0 2023-11-29 00:18:39,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3730940.0, ans=0.0 2023-11-29 00:18:39,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3730940.0, ans=0.125 2023-11-29 00:18:49,045 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-29 00:18:52,569 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6550, loss[loss=0.05524, simple_loss=0.08612, pruned_loss=0.005821, audio_tagging_loss=0.006362, over 15127.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08942, pruned_loss=0.01197, audio_tagging_loss=0.008749, over 3051358.17 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:18:57,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-29 00:19:06,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-11-29 00:19:13,658 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:19:13,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3731073.3333333335, ans=0.125 2023-11-29 00:19:38,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3731206.6666666665, ans=0.2 2023-11-29 00:19:43,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2023-11-29 00:19:46,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731273.3333333335, ans=0.1 2023-11-29 00:19:51,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-29 00:19:54,484 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6600, loss[loss=0.05775, simple_loss=0.07303, pruned_loss=0.009623, audio_tagging_loss=0.01161, over 15118.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.0896, pruned_loss=0.01203, audio_tagging_loss=0.008616, over 3047535.96 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:19:59,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3731340.0, ans=0.125 2023-11-29 00:20:20,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-29 00:20:20,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.919e+01 9.465e+01 1.014e+02 1.286e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 00:20:52,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-29 00:20:56,666 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6650, loss[loss=0.07302, simple_loss=0.09849, pruned_loss=0.01472, audio_tagging_loss=0.009049, over 14428.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08993, pruned_loss=0.01203, audio_tagging_loss=0.008582, over 3037959.10 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:21:00,000 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:21:05,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-29 00:21:20,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3731806.6666666665, ans=0.2 2023-11-29 00:21:25,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3731806.6666666665, ans=0.0 2023-11-29 00:21:42,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3731873.3333333335, ans=0.04949747468305833 2023-11-29 00:21:42,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-11-29 00:21:52,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3731940.0, ans=0.0 2023-11-29 00:21:54,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-29 00:21:58,734 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6700, loss[loss=0.0574, simple_loss=0.07747, pruned_loss=0.01099, audio_tagging_loss=0.007683, over 14472.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08994, pruned_loss=0.012, audio_tagging_loss=0.008515, over 3034567.18 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:22:07,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3732006.6666666665, ans=0.0 2023-11-29 00:22:24,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.131e+01 9.664e+01 1.036e+02 1.396e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 00:22:36,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3732206.6666666665, ans=0.0 2023-11-29 00:22:39,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2023-11-29 00:22:56,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-29 00:22:59,580 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6750, loss[loss=0.07371, simple_loss=0.1047, pruned_loss=0.01512, audio_tagging_loss=0.006255, over 16182.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08943, pruned_loss=0.01204, audio_tagging_loss=0.008459, over 3037656.67 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:23:12,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3732406.6666666665, ans=0.0 2023-11-29 00:23:14,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3732406.6666666665, ans=0.125 2023-11-29 00:23:30,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732473.3333333335, ans=0.125 2023-11-29 00:23:36,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3732540.0, ans=0.125 2023-11-29 00:23:37,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3732540.0, ans=0.0 2023-11-29 00:23:45,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=12.0 2023-11-29 00:23:49,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732606.6666666665, ans=0.1 2023-11-29 00:23:50,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-29 00:23:58,369 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-29 00:24:01,788 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6800, loss[loss=0.05849, simple_loss=0.07843, pruned_loss=0.01187, audio_tagging_loss=0.007403, over 15281.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08971, pruned_loss=0.01205, audio_tagging_loss=0.008405, over 3033098.58 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:24:10,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-11-29 00:24:20,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-29 00:24:27,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.091e+01 9.725e+01 1.038e+02 3.036e+02, threshold=1.945e+02, percent-clipped=1.0 2023-11-29 00:24:33,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-29 00:24:46,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3732873.3333333335, ans=0.0 2023-11-29 00:24:59,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3732940.0, ans=0.0 2023-11-29 00:25:00,626 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-29 00:25:04,116 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6850, loss[loss=0.07419, simple_loss=0.105, pruned_loss=0.01474, audio_tagging_loss=0.006936, over 15033.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08925, pruned_loss=0.0119, audio_tagging_loss=0.008453, over 3037298.65 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:25:10,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3733006.6666666665, ans=0.125 2023-11-29 00:25:18,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3733073.3333333335, ans=0.125 2023-11-29 00:25:20,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3733073.3333333335, ans=0.125 2023-11-29 00:25:25,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3733073.3333333335, ans=0.05 2023-11-29 00:25:29,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3733140.0, ans=0.125 2023-11-29 00:25:36,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3733140.0, ans=0.125 2023-11-29 00:25:37,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3733140.0, ans=0.0 2023-11-29 00:25:51,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733206.6666666665, ans=0.1 2023-11-29 00:25:58,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3733273.3333333335, ans=0.125 2023-11-29 00:26:02,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-29 00:26:08,419 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6900, loss[loss=0.07995, simple_loss=0.1148, pruned_loss=0.01633, audio_tagging_loss=0.006245, over 15531.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.09022, pruned_loss=0.01208, audio_tagging_loss=0.008409, over 3038493.04 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:26:24,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3733406.6666666665, ans=0.0 2023-11-29 00:26:36,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.734e+01 9.536e+01 1.026e+02 1.241e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 00:26:37,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-29 00:26:42,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3733473.3333333335, ans=0.125 2023-11-29 00:26:47,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3733540.0, ans=0.2 2023-11-29 00:26:57,714 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:27:01,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-29 00:27:06,815 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-29 00:27:07,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-29 00:27:10,743 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6950, loss[loss=0.06447, simple_loss=0.09002, pruned_loss=0.01148, audio_tagging_loss=0.007982, over 15943.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08956, pruned_loss=0.01179, audio_tagging_loss=0.008416, over 3042328.06 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:27:11,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3733673.3333333335, ans=0.2 2023-11-29 00:27:23,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2023-11-29 00:27:29,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733740.0, ans=0.1 2023-11-29 00:27:41,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-29 00:28:00,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2023-11-29 00:28:03,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3733940.0, ans=0.025 2023-11-29 00:28:03,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3733940.0, ans=0.0 2023-11-29 00:28:09,806 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-29 00:28:13,143 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7000, loss[loss=0.05545, simple_loss=0.06949, pruned_loss=0.01137, audio_tagging_loss=0.009342, over 16376.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08995, pruned_loss=0.01198, audio_tagging_loss=0.008495, over 3043313.79 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:28:20,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.98 vs. limit=22.5 2023-11-29 00:28:22,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3734006.6666666665, ans=0.0 2023-11-29 00:28:29,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3734073.3333333335, ans=0.015 2023-11-29 00:28:32,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3734073.3333333335, ans=0.0 2023-11-29 00:28:37,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3734140.0, ans=0.125 2023-11-29 00:28:38,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.937e+01 9.480e+01 1.049e+02 1.230e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 00:28:58,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3734206.6666666665, ans=0.0 2023-11-29 00:29:10,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-29 00:29:13,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-29 00:29:13,830 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7050, loss[loss=0.06544, simple_loss=0.08454, pruned_loss=0.01179, audio_tagging_loss=0.01138, over 15548.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08994, pruned_loss=0.01186, audio_tagging_loss=0.008531, over 3044100.91 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:29:16,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3734340.0, ans=0.125 2023-11-29 00:29:23,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3734340.0, ans=15.0 2023-11-29 00:29:25,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3734406.6666666665, ans=0.0 2023-11-29 00:29:40,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-29 00:29:45,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-29 00:29:52,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3734540.0, ans=0.0 2023-11-29 00:30:11,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-29 00:30:16,160 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7100, loss[loss=0.04955, simple_loss=0.0679, pruned_loss=0.006105, audio_tagging_loss=0.009495, over 14530.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09031, pruned_loss=0.01206, audio_tagging_loss=0.008618, over 3047687.10 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:30:43,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.863e+01 9.578e+01 1.032e+02 1.275e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 00:30:54,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3734873.3333333335, ans=0.07 2023-11-29 00:30:59,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3734873.3333333335, ans=0.2 2023-11-29 00:31:13,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3734940.0, ans=0.125 2023-11-29 00:31:14,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-29 00:31:16,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3735006.6666666665, ans=0.125 2023-11-29 00:31:18,589 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7150, loss[loss=0.0734, simple_loss=0.104, pruned_loss=0.01311, audio_tagging_loss=0.008265, over 15638.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08983, pruned_loss=0.01191, audio_tagging_loss=0.008679, over 3044559.06 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:31:23,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.68 vs. limit=6.0 2023-11-29 00:31:41,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3735140.0, ans=0.0 2023-11-29 00:31:51,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3735140.0, ans=0.1 2023-11-29 00:32:16,490 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-29 00:32:17,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3735273.3333333335, ans=0.0 2023-11-29 00:32:19,860 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7200, loss[loss=0.05488, simple_loss=0.07175, pruned_loss=0.009013, audio_tagging_loss=0.009999, over 14747.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08965, pruned_loss=0.01193, audio_tagging_loss=0.008702, over 3043590.16 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:32:27,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-29 00:32:30,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735406.6666666665, ans=0.1 2023-11-29 00:32:37,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.32 vs. limit=10.0 2023-11-29 00:32:39,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3735406.6666666665, ans=0.015 2023-11-29 00:32:47,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 8.965e+01 9.449e+01 1.037e+02 1.518e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 00:32:57,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3735540.0, ans=0.0 2023-11-29 00:32:59,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3735540.0, ans=0.125 2023-11-29 00:33:01,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3735540.0, ans=0.5 2023-11-29 00:33:17,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-29 00:33:20,698 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7250, loss[loss=0.05778, simple_loss=0.07544, pruned_loss=0.01077, audio_tagging_loss=0.009292, over 15127.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08968, pruned_loss=0.01201, audio_tagging_loss=0.008693, over 3045490.79 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:33:28,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3735673.3333333335, ans=0.125 2023-11-29 00:33:36,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3735740.0, ans=0.125 2023-11-29 00:33:48,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=22.5 2023-11-29 00:33:53,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3735806.6666666665, ans=0.125 2023-11-29 00:34:03,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3735873.3333333335, ans=0.95 2023-11-29 00:34:19,887 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-29 00:34:23,691 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7300, loss[loss=0.09699, simple_loss=0.1405, pruned_loss=0.02177, audio_tagging_loss=0.004964, over 14478.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08999, pruned_loss=0.0122, audio_tagging_loss=0.008597, over 3047048.26 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:34:46,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3736140.0, ans=0.2 2023-11-29 00:34:51,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.804e+01 9.526e+01 1.009e+02 1.275e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 00:35:00,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3736206.6666666665, ans=0.0 2023-11-29 00:35:03,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3736206.6666666665, ans=0.0 2023-11-29 00:35:21,730 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-29 00:35:25,187 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7350, loss[loss=0.06852, simple_loss=0.09152, pruned_loss=0.01362, audio_tagging_loss=0.009147, over 15088.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09096, pruned_loss=0.01243, audio_tagging_loss=0.008442, over 3042690.16 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:35:30,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3736340.0, ans=0.2 2023-11-29 00:35:47,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3736406.6666666665, ans=0.125 2023-11-29 00:35:48,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3736473.3333333335, ans=0.95 2023-11-29 00:36:03,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3736540.0, ans=0.0 2023-11-29 00:36:11,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3736540.0, ans=0.125 2023-11-29 00:36:20,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2023-11-29 00:36:23,119 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-29 00:36:26,706 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7400, loss[loss=0.06439, simple_loss=0.0878, pruned_loss=0.0114, audio_tagging_loss=0.009087, over 17315.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08999, pruned_loss=0.01232, audio_tagging_loss=0.008411, over 3039110.38 frames. ], batch size: 66, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:36:56,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 9.052e+01 9.894e+01 1.069e+02 1.258e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 00:37:08,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3736873.3333333335, ans=0.1 2023-11-29 00:37:24,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3736940.0, ans=0.07 2023-11-29 00:37:25,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-29 00:37:29,062 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7450, loss[loss=0.08414, simple_loss=0.1175, pruned_loss=0.02174, audio_tagging_loss=0.003657, over 14308.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08921, pruned_loss=0.01198, audio_tagging_loss=0.008378, over 3039677.63 frames. ], batch size: 52, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:37:30,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3737006.6666666665, ans=0.125 2023-11-29 00:37:40,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3737006.6666666665, ans=0.0 2023-11-29 00:38:10,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3737206.6666666665, ans=0.1 2023-11-29 00:38:26,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-29 00:38:30,331 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7500, loss[loss=0.07072, simple_loss=0.09938, pruned_loss=0.01201, audio_tagging_loss=0.009018, over 14872.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08853, pruned_loss=0.01185, audio_tagging_loss=0.008444, over 3047281.24 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:38:42,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3737406.6666666665, ans=0.125 2023-11-29 00:38:55,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3737473.3333333335, ans=0.125 2023-11-29 00:38:58,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.888e+01 9.650e+01 1.039e+02 1.258e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 00:39:07,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2023-11-29 00:39:25,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3737606.6666666665, ans=0.125 2023-11-29 00:39:28,985 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-29 00:39:32,461 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7550, loss[loss=0.05611, simple_loss=0.08066, pruned_loss=0.006536, audio_tagging_loss=0.009243, over 15547.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08891, pruned_loss=0.0119, audio_tagging_loss=0.008483, over 3041006.42 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:39:45,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=22.5 2023-11-29 00:39:47,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3737740.0, ans=0.0 2023-11-29 00:40:30,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-29 00:40:34,518 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7600, loss[loss=0.06954, simple_loss=0.09751, pruned_loss=0.01191, audio_tagging_loss=0.008879, over 15527.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08862, pruned_loss=0.01172, audio_tagging_loss=0.008498, over 3044360.96 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:40:56,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3738073.3333333335, ans=0.125 2023-11-29 00:41:03,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 9.023e+01 9.623e+01 1.078e+02 1.517e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 00:41:04,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3738140.0, ans=0.04949747468305833 2023-11-29 00:41:04,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3738140.0, ans=0.125 2023-11-29 00:41:15,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3738206.6666666665, ans=0.125 2023-11-29 00:41:16,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3738206.6666666665, ans=0.2 2023-11-29 00:41:21,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3738206.6666666665, ans=0.1 2023-11-29 00:41:32,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3738273.3333333335, ans=0.125 2023-11-29 00:41:33,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-29 00:41:37,291 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7650, loss[loss=0.06632, simple_loss=0.08777, pruned_loss=0.01245, audio_tagging_loss=0.009981, over 15292.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08891, pruned_loss=0.0118, audio_tagging_loss=0.008494, over 3037763.52 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:42:06,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3738473.3333333335, ans=0.1 2023-11-29 00:42:14,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3738540.0, ans=0.125 2023-11-29 00:42:22,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3738540.0, ans=0.025 2023-11-29 00:42:24,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3738540.0, ans=0.125 2023-11-29 00:42:24,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-29 00:42:25,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-29 00:42:34,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-29 00:42:36,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3738606.6666666665, ans=0.125 2023-11-29 00:42:38,563 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7700, loss[loss=0.07066, simple_loss=0.09815, pruned_loss=0.01244, audio_tagging_loss=0.009148, over 14929.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08927, pruned_loss=0.0118, audio_tagging_loss=0.008428, over 3037477.41 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:43:05,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3738806.6666666665, ans=0.125 2023-11-29 00:43:08,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.118e+01 9.854e+01 1.042e+02 1.331e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 00:43:16,300 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:43:29,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3738940.0, ans=0.0 2023-11-29 00:43:36,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-29 00:43:40,139 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7750, loss[loss=0.07642, simple_loss=0.1008, pruned_loss=0.01461, audio_tagging_loss=0.01139, over 15537.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08901, pruned_loss=0.01172, audio_tagging_loss=0.008509, over 3037317.31 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:43:41,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3739006.6666666665, ans=0.125 2023-11-29 00:44:02,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3739073.3333333335, ans=0.125 2023-11-29 00:44:38,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-29 00:44:38,681 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-29 00:44:42,117 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7800, loss[loss=0.08879, simple_loss=0.1206, pruned_loss=0.02087, audio_tagging_loss=0.00764, over 15114.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08915, pruned_loss=0.01169, audio_tagging_loss=0.008494, over 3038306.88 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:44:55,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3739406.6666666665, ans=0.2 2023-11-29 00:44:58,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3739406.6666666665, ans=0.125 2023-11-29 00:45:10,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3739473.3333333335, ans=0.125 2023-11-29 00:45:10,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3739473.3333333335, ans=0.0 2023-11-29 00:45:11,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.840e+01 9.485e+01 1.019e+02 1.348e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 00:45:14,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-29 00:45:16,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3739473.3333333335, ans=0.0 2023-11-29 00:45:33,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3739606.6666666665, ans=0.1 2023-11-29 00:45:41,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-29 00:45:41,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-29 00:45:44,481 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7850, loss[loss=0.0629, simple_loss=0.08568, pruned_loss=0.0119, audio_tagging_loss=0.008159, over 15857.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08984, pruned_loss=0.01184, audio_tagging_loss=0.008558, over 3042004.69 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:45:58,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3739740.0, ans=0.09899494936611666 2023-11-29 00:46:00,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3739740.0, ans=0.035 2023-11-29 00:46:41,915 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-29 00:46:45,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3740006.6666666665, ans=0.0 2023-11-29 00:46:46,328 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7900, loss[loss=0.04892, simple_loss=0.06147, pruned_loss=0.009211, audio_tagging_loss=0.008979, over 16970.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08944, pruned_loss=0.01191, audio_tagging_loss=0.008647, over 3046034.35 frames. ], batch size: 65, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:46:51,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=22.5 2023-11-29 00:47:08,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3740073.3333333335, ans=0.125 2023-11-29 00:47:16,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.161e+01 9.815e+01 1.045e+02 1.564e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 00:47:26,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-11-29 00:47:30,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3740206.6666666665, ans=0.125 2023-11-29 00:47:32,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3740206.6666666665, ans=0.1 2023-11-29 00:47:44,238 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-29 00:47:48,096 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7950, loss[loss=0.0623, simple_loss=0.08051, pruned_loss=0.01188, audio_tagging_loss=0.01017, over 15469.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08916, pruned_loss=0.01171, audio_tagging_loss=0.008767, over 3047137.58 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:47:56,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3740340.0, ans=0.0 2023-11-29 00:47:59,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3740406.6666666665, ans=0.035 2023-11-29 00:48:05,059 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:48:45,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-29 00:48:48,849 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8000, loss[loss=0.06013, simple_loss=0.08383, pruned_loss=0.009682, audio_tagging_loss=0.00853, over 14904.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08914, pruned_loss=0.01176, audio_tagging_loss=0.008854, over 3049549.10 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:49:00,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3740740.0, ans=0.2 2023-11-29 00:49:05,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3740740.0, ans=0.2 2023-11-29 00:49:12,736 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:49:16,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3740806.6666666665, ans=0.125 2023-11-29 00:49:18,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.940e+01 9.510e+01 1.008e+02 1.278e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 00:49:32,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3740873.3333333335, ans=0.125 2023-11-29 00:49:37,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3740940.0, ans=0.125 2023-11-29 00:49:45,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3740940.0, ans=0.2 2023-11-29 00:49:46,724 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-29 00:49:51,190 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8050, loss[loss=0.0729, simple_loss=0.1, pruned_loss=0.01683, audio_tagging_loss=0.00605, over 16287.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09005, pruned_loss=0.01197, audio_tagging_loss=0.008813, over 3053090.17 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:50:12,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3741073.3333333335, ans=0.125 2023-11-29 00:50:16,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-29 00:50:36,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3741206.6666666665, ans=0.0 2023-11-29 00:50:48,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-29 00:50:52,564 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8100, loss[loss=0.07352, simple_loss=0.09866, pruned_loss=0.0145, audio_tagging_loss=0.009691, over 15609.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08919, pruned_loss=0.01185, audio_tagging_loss=0.008859, over 3049678.08 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:51:06,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3741406.6666666665, ans=0.125 2023-11-29 00:51:22,822 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.038e+01 9.545e+01 1.047e+02 1.336e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 00:51:33,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=12.0 2023-11-29 00:51:48,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3741606.6666666665, ans=0.125 2023-11-29 00:51:50,123 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-29 00:51:53,571 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8150, loss[loss=0.05819, simple_loss=0.07413, pruned_loss=0.01215, audio_tagging_loss=0.008973, over 14381.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08845, pruned_loss=0.01173, audio_tagging_loss=0.008675, over 3053749.36 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:14,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741740.0, ans=0.1 2023-11-29 00:52:32,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=22.5 2023-11-29 00:52:35,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3741873.3333333335, ans=0.05 2023-11-29 00:52:51,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-29 00:52:55,182 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8200, loss[loss=0.0595, simple_loss=0.08582, pruned_loss=0.009752, audio_tagging_loss=0.006833, over 15557.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08888, pruned_loss=0.01193, audio_tagging_loss=0.008545, over 3056772.56 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:58,135 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:52:58,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-29 00:52:59,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-11-29 00:53:04,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-29 00:53:22,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3742140.0, ans=0.2 2023-11-29 00:53:25,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 9.120e+01 9.757e+01 1.054e+02 1.290e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 00:53:26,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3742140.0, ans=0.0 2023-11-29 00:53:27,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3742140.0, ans=0.1 2023-11-29 00:53:53,479 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-29 00:53:57,506 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8250, loss[loss=0.06341, simple_loss=0.08868, pruned_loss=0.01106, audio_tagging_loss=0.008008, over 14906.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.089, pruned_loss=0.0119, audio_tagging_loss=0.008533, over 3056686.02 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:54:02,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3742340.0, ans=0.125 2023-11-29 00:54:13,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-29 00:54:44,600 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:54:55,587 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-29 00:54:59,467 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8300, loss[loss=0.07024, simple_loss=0.1021, pruned_loss=0.009908, audio_tagging_loss=0.009267, over 14282.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.0895, pruned_loss=0.01194, audio_tagging_loss=0.008449, over 3058864.10 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:55:22,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3742806.6666666665, ans=0.125 2023-11-29 00:55:29,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.057e+01 9.720e+01 1.032e+02 1.351e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 00:55:39,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3742873.3333333335, ans=0.04949747468305833 2023-11-29 00:55:48,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3742940.0, ans=0.125 2023-11-29 00:55:56,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-29 00:55:59,608 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8350, loss[loss=0.04491, simple_loss=0.05717, pruned_loss=0.007225, audio_tagging_loss=0.009097, over 15997.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08902, pruned_loss=0.0119, audio_tagging_loss=0.008387, over 3049024.15 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:56:05,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3743006.6666666665, ans=0.125 2023-11-29 00:56:24,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3743140.0, ans=0.125 2023-11-29 00:56:24,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3743140.0, ans=0.2 2023-11-29 00:56:25,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3743140.0, ans=0.125 2023-11-29 00:56:26,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3743140.0, ans=0.1 2023-11-29 00:56:37,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-29 00:56:39,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3743206.6666666665, ans=0.09899494936611666 2023-11-29 00:56:47,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3743273.3333333335, ans=0.1 2023-11-29 00:56:57,956 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-29 00:57:01,966 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8400, loss[loss=0.06242, simple_loss=0.08138, pruned_loss=0.01439, audio_tagging_loss=0.007337, over 15313.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.0889, pruned_loss=0.01196, audio_tagging_loss=0.008453, over 3047403.88 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:57:02,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3743340.0, ans=0.0 2023-11-29 00:57:07,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3743340.0, ans=0.015 2023-11-29 00:57:08,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3743340.0, ans=0.0 2023-11-29 00:57:31,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.851e+01 9.361e+01 9.943e+01 1.259e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 00:57:40,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3743540.0, ans=0.1 2023-11-29 00:57:51,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3743606.6666666665, ans=0.07 2023-11-29 00:57:51,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3743606.6666666665, ans=0.0 2023-11-29 00:58:00,013 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-29 00:58:03,565 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8450, loss[loss=0.06239, simple_loss=0.09148, pruned_loss=0.009627, audio_tagging_loss=0.007026, over 15423.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0888, pruned_loss=0.01189, audio_tagging_loss=0.008508, over 3049331.79 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:58:04,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2023-11-29 00:58:28,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3743806.6666666665, ans=0.0 2023-11-29 00:58:29,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3743806.6666666665, ans=0.125 2023-11-29 00:58:46,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2023-11-29 00:58:48,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=22.5 2023-11-29 00:59:01,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-29 00:59:04,921 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8500, loss[loss=0.06075, simple_loss=0.07464, pruned_loss=0.01149, audio_tagging_loss=0.01195, over 14618.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0888, pruned_loss=0.01191, audio_tagging_loss=0.008561, over 3046807.15 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:59:13,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3744006.6666666665, ans=10.0 2023-11-29 00:59:32,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3744140.0, ans=0.0 2023-11-29 00:59:38,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 8.998e+01 9.683e+01 1.039e+02 1.237e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 00:59:46,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3744206.6666666665, ans=0.1 2023-11-29 00:59:53,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-29 00:59:54,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3744273.3333333335, ans=0.95 2023-11-29 01:00:03,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-29 01:00:06,545 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8550, loss[loss=0.0767, simple_loss=0.1154, pruned_loss=0.01388, audio_tagging_loss=0.00512, over 15187.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08886, pruned_loss=0.01198, audio_tagging_loss=0.008551, over 3054575.14 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:00:12,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3744340.0, ans=0.2 2023-11-29 01:00:17,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3744340.0, ans=0.125 2023-11-29 01:00:34,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3744473.3333333335, ans=0.125 2023-11-29 01:01:05,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-29 01:01:08,379 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:01:09,143 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8600, loss[loss=0.0634, simple_loss=0.08547, pruned_loss=0.009968, audio_tagging_loss=0.01069, over 14674.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08771, pruned_loss=0.01175, audio_tagging_loss=0.008664, over 3054005.97 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:01:40,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.937e+01 9.624e+01 1.044e+02 4.545e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-29 01:01:59,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3744940.0, ans=0.07 2023-11-29 01:02:06,576 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-29 01:02:10,021 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8650, loss[loss=0.05915, simple_loss=0.08125, pruned_loss=0.008603, audio_tagging_loss=0.009923, over 15755.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08939, pruned_loss=0.01191, audio_tagging_loss=0.008601, over 3053577.30 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:02:52,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3745206.6666666665, ans=0.05 2023-11-29 01:02:53,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-11-29 01:02:57,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=22.5 2023-11-29 01:03:04,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-29 01:03:06,243 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:03:07,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-29 01:03:11,076 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8700, loss[loss=0.08485, simple_loss=0.1136, pruned_loss=0.02032, audio_tagging_loss=0.007745, over 14373.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08837, pruned_loss=0.0118, audio_tagging_loss=0.008716, over 3062489.37 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:03:17,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3745340.0, ans=0.0 2023-11-29 01:03:44,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.978e+01 9.587e+01 1.045e+02 1.358e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:03:47,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3745540.0, ans=0.125 2023-11-29 01:04:10,030 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-29 01:04:14,715 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8750, loss[loss=0.05904, simple_loss=0.08352, pruned_loss=0.007407, audio_tagging_loss=0.009874, over 16011.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08885, pruned_loss=0.01198, audio_tagging_loss=0.00869, over 3061036.76 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 8.0 2023-11-29 01:04:32,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3745740.0, ans=0.2 2023-11-29 01:04:39,334 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:04:41,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3745806.6666666665, ans=0.1 2023-11-29 01:04:44,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3745806.6666666665, ans=0.125 2023-11-29 01:05:11,939 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-29 01:05:15,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2023-11-29 01:05:15,317 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8800, loss[loss=0.09785, simple_loss=0.1329, pruned_loss=0.02154, audio_tagging_loss=0.009842, over 15402.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08975, pruned_loss=0.0122, audio_tagging_loss=0.008771, over 3060708.49 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:05:39,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3746140.0, ans=0.0 2023-11-29 01:05:49,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.250e+01 9.769e+01 1.064e+02 1.773e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 01:05:52,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3746206.6666666665, ans=0.125 2023-11-29 01:05:55,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746206.6666666665, ans=0.1 2023-11-29 01:06:09,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2023-11-29 01:06:12,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-29 01:06:16,819 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8850, loss[loss=0.08643, simple_loss=0.1267, pruned_loss=0.0176, audio_tagging_loss=0.005469, over 15748.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08908, pruned_loss=0.01205, audio_tagging_loss=0.008782, over 3057809.47 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:06:30,486 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:07:06,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746606.6666666665, ans=0.1 2023-11-29 01:07:14,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-29 01:07:16,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3746606.6666666665, ans=0.125 2023-11-29 01:07:19,269 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8900, loss[loss=0.06139, simple_loss=0.0819, pruned_loss=0.01092, audio_tagging_loss=0.009527, over 15601.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08945, pruned_loss=0.01213, audio_tagging_loss=0.008699, over 3062923.77 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:07:27,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3746673.3333333335, ans=0.1 2023-11-29 01:07:36,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3746740.0, ans=0.125 2023-11-29 01:07:52,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.904e+01 9.438e+01 1.016e+02 1.537e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 01:07:54,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3746873.3333333335, ans=0.125 2023-11-29 01:07:58,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3746873.3333333335, ans=0.125 2023-11-29 01:08:01,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3746873.3333333335, ans=0.0 2023-11-29 01:08:09,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3746940.0, ans=0.0 2023-11-29 01:08:17,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-29 01:08:18,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3746940.0, ans=0.0 2023-11-29 01:08:20,671 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8950, loss[loss=0.04732, simple_loss=0.06476, pruned_loss=0.005572, audio_tagging_loss=0.009368, over 14628.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08872, pruned_loss=0.01212, audio_tagging_loss=0.00864, over 3058074.81 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:08:42,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3747073.3333333335, ans=0.0 2023-11-29 01:09:00,176 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:09:10,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3747273.3333333335, ans=0.125 2023-11-29 01:09:17,835 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-29 01:09:21,833 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9000, loss[loss=0.0704, simple_loss=0.09905, pruned_loss=0.01316, audio_tagging_loss=0.00772, over 15345.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09065, pruned_loss=0.01246, audio_tagging_loss=0.008446, over 3054452.88 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:09:21,834 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 01:09:41,162 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8410, 1.8276, 3.4097, 3.0828, 2.9671, 3.0341, 2.8592, 3.0793], device='cuda:3') 2023-11-29 01:10:02,095 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05855, simple_loss=0.05046, pruned_loss=0.005347, audio_tagging_loss=0.02798, over 4681554.00 frames. 2023-11-29 01:10:02,095 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 01:10:05,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2023-11-29 01:10:13,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-29 01:10:22,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2023-11-29 01:10:23,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2023-11-29 01:10:34,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.906e+01 9.620e+01 1.037e+02 1.250e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:10:46,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=22.5 2023-11-29 01:10:59,226 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-29 01:11:03,480 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9050, loss[loss=0.06369, simple_loss=0.09019, pruned_loss=0.01066, audio_tagging_loss=0.007934, over 15333.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09031, pruned_loss=0.01238, audio_tagging_loss=0.008468, over 3053808.03 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:11:23,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-11-29 01:11:24,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-29 01:11:49,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-11-29 01:11:54,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3747940.0, ans=0.125 2023-11-29 01:12:00,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3747940.0, ans=0.0 2023-11-29 01:12:01,705 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-29 01:12:05,344 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9100, loss[loss=0.06463, simple_loss=0.09339, pruned_loss=0.009635, audio_tagging_loss=0.008299, over 15826.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09002, pruned_loss=0.01226, audio_tagging_loss=0.008346, over 3048611.41 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:12:10,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3748006.6666666665, ans=0.0 2023-11-29 01:12:13,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2023-11-29 01:12:22,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3748073.3333333335, ans=0.125 2023-11-29 01:12:38,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.957e+01 8.933e+01 9.563e+01 1.020e+02 1.667e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 01:12:44,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3748206.6666666665, ans=10.0 2023-11-29 01:12:46,909 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:13:02,982 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-29 01:13:06,505 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9150, loss[loss=0.04539, simple_loss=0.0667, pruned_loss=0.004556, audio_tagging_loss=0.007484, over 15420.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09018, pruned_loss=0.01222, audio_tagging_loss=0.008275, over 3055705.95 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:13:11,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-29 01:13:31,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3748473.3333333335, ans=0.0 2023-11-29 01:13:39,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3748473.3333333335, ans=0.0 2023-11-29 01:13:40,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3748473.3333333335, ans=0.125 2023-11-29 01:14:04,745 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-29 01:14:08,042 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9200, loss[loss=0.06085, simple_loss=0.08094, pruned_loss=0.01239, audio_tagging_loss=0.007986, over 16277.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09037, pruned_loss=0.01243, audio_tagging_loss=0.008324, over 3055903.88 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:14:21,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-11-29 01:14:41,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.164e+01 9.710e+01 1.033e+02 1.295e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 01:15:06,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-29 01:15:06,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3748940.0, ans=0.125 2023-11-29 01:15:10,317 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9250, loss[loss=0.05483, simple_loss=0.07002, pruned_loss=0.01049, audio_tagging_loss=0.009337, over 15161.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08979, pruned_loss=0.01226, audio_tagging_loss=0.008331, over 3058977.26 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:15:24,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3749073.3333333335, ans=0.0 2023-11-29 01:16:07,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-29 01:16:11,602 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9300, loss[loss=0.06599, simple_loss=0.09133, pruned_loss=0.01214, audio_tagging_loss=0.008187, over 15129.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.0898, pruned_loss=0.01221, audio_tagging_loss=0.00845, over 3058941.46 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:16:21,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-29 01:16:35,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3749473.3333333335, ans=0.125 2023-11-29 01:16:45,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 9.046e+01 9.645e+01 1.038e+02 1.624e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 01:16:59,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2023-11-29 01:17:07,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3749606.6666666665, ans=0.125 2023-11-29 01:17:08,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3749606.6666666665, ans=0.125 2023-11-29 01:17:09,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-29 01:17:13,310 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9350, loss[loss=0.07453, simple_loss=0.08901, pruned_loss=0.01927, audio_tagging_loss=0.01075, over 13353.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08935, pruned_loss=0.01216, audio_tagging_loss=0.008527, over 3056164.57 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:17:46,832 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:17:49,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3749873.3333333335, ans=0.125 2023-11-29 01:18:08,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3749940.0, ans=10.0 2023-11-29 01:18:10,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-29 01:18:15,246 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9400, loss[loss=0.06357, simple_loss=0.08592, pruned_loss=0.01197, audio_tagging_loss=0.008637, over 14654.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08995, pruned_loss=0.01226, audio_tagging_loss=0.008582, over 3051462.93 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:18:48,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.214e+01 9.788e+01 1.040e+02 1.202e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:18:57,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3750206.6666666665, ans=0.0 2023-11-29 01:19:12,675 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-29 01:19:16,057 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9450, loss[loss=0.0726, simple_loss=0.09178, pruned_loss=0.01787, audio_tagging_loss=0.008842, over 13976.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08955, pruned_loss=0.01207, audio_tagging_loss=0.008637, over 3049862.36 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:19:17,782 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:19:41,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=15.0 2023-11-29 01:19:44,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3750473.3333333335, ans=0.125 2023-11-29 01:19:46,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3750473.3333333335, ans=0.125 2023-11-29 01:19:53,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3750540.0, ans=0.125 2023-11-29 01:19:55,479 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:20:00,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3750540.0, ans=0.1 2023-11-29 01:20:15,308 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-29 01:20:19,010 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9500, loss[loss=0.04983, simple_loss=0.06334, pruned_loss=0.009532, audio_tagging_loss=0.008621, over 14439.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08992, pruned_loss=0.01202, audio_tagging_loss=0.00862, over 3044820.83 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:20:27,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3750673.3333333335, ans=0.0 2023-11-29 01:20:41,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3750740.0, ans=0.125 2023-11-29 01:20:41,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3750740.0, ans=0.125 2023-11-29 01:20:43,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-29 01:20:43,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750806.6666666665, ans=0.1 2023-11-29 01:20:53,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.884e+01 9.617e+01 1.043e+02 1.271e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 01:20:56,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3750873.3333333335, ans=0.125 2023-11-29 01:20:57,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3750873.3333333335, ans=0.0 2023-11-29 01:21:17,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-29 01:21:20,486 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9550, loss[loss=0.06777, simple_loss=0.1019, pruned_loss=0.01043, audio_tagging_loss=0.006408, over 15563.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08997, pruned_loss=0.01203, audio_tagging_loss=0.008733, over 3050238.20 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:21:53,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3751140.0, ans=0.95 2023-11-29 01:21:59,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3751206.6666666665, ans=0.5 2023-11-29 01:22:19,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-29 01:22:22,945 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9600, loss[loss=0.07803, simple_loss=0.104, pruned_loss=0.0182, audio_tagging_loss=0.007834, over 15246.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09008, pruned_loss=0.01197, audio_tagging_loss=0.008817, over 3047403.69 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:22:32,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3751340.0, ans=0.2 2023-11-29 01:22:57,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.993e+01 9.667e+01 1.037e+02 1.328e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 01:22:57,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3751473.3333333335, ans=0.0 2023-11-29 01:22:59,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-11-29 01:23:00,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3751540.0, ans=0.125 2023-11-29 01:23:15,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3751606.6666666665, ans=0.0 2023-11-29 01:23:21,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-29 01:23:25,376 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9650, loss[loss=0.06468, simple_loss=0.09034, pruned_loss=0.009714, audio_tagging_loss=0.009793, over 16340.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01196, audio_tagging_loss=0.008866, over 3043742.38 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:23:51,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3751806.6666666665, ans=0.2 2023-11-29 01:24:11,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-11-29 01:24:22,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-29 01:24:24,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751940.0, ans=0.1 2023-11-29 01:24:26,713 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9700, loss[loss=0.06495, simple_loss=0.08669, pruned_loss=0.01446, audio_tagging_loss=0.007144, over 14305.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08923, pruned_loss=0.012, audio_tagging_loss=0.008718, over 3041386.64 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:24:29,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3752006.6666666665, ans=0.125 2023-11-29 01:24:43,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-29 01:24:52,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3752140.0, ans=0.125 2023-11-29 01:24:56,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3752140.0, ans=0.125 2023-11-29 01:25:01,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-29 01:25:01,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.050e+01 9.542e+01 1.032e+02 1.533e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:25:24,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-29 01:25:26,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3752273.3333333335, ans=0.125 2023-11-29 01:25:28,341 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9750, loss[loss=0.06742, simple_loss=0.08259, pruned_loss=0.01596, audio_tagging_loss=0.01017, over 14700.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08912, pruned_loss=0.01194, audio_tagging_loss=0.008559, over 3046396.75 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:25:29,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3752340.0, ans=0.1 2023-11-29 01:25:36,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3752340.0, ans=0.125 2023-11-29 01:25:39,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3752340.0, ans=0.0 2023-11-29 01:26:18,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3752606.6666666665, ans=0.125 2023-11-29 01:26:23,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3752606.6666666665, ans=0.125 2023-11-29 01:26:26,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3752606.6666666665, ans=0.125 2023-11-29 01:26:28,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-29 01:26:31,444 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9800, loss[loss=0.05565, simple_loss=0.07893, pruned_loss=0.008153, audio_tagging_loss=0.008039, over 16217.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08901, pruned_loss=0.0119, audio_tagging_loss=0.008508, over 3045026.98 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:26:31,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-29 01:26:48,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3752740.0, ans=0.125 2023-11-29 01:26:48,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2023-11-29 01:26:56,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3752806.6666666665, ans=0.1 2023-11-29 01:27:01,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3752806.6666666665, ans=0.05 2023-11-29 01:27:04,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.999e+01 9.540e+01 1.035e+02 1.290e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:27:07,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3752873.3333333335, ans=0.0 2023-11-29 01:27:23,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3752940.0, ans=0.125 2023-11-29 01:27:27,924 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:27:27,987 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-29 01:27:31,215 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9850, loss[loss=0.06753, simple_loss=0.08875, pruned_loss=0.01414, audio_tagging_loss=0.009017, over 15463.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08957, pruned_loss=0.012, audio_tagging_loss=0.008352, over 3050410.55 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:27:32,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3753006.6666666665, ans=0.0 2023-11-29 01:27:53,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3753073.3333333335, ans=0.125 2023-11-29 01:27:59,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3753140.0, ans=0.125 2023-11-29 01:28:20,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3753273.3333333335, ans=0.04949747468305833 2023-11-29 01:28:21,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3753273.3333333335, ans=0.1 2023-11-29 01:28:29,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-29 01:28:33,632 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9900, loss[loss=0.05004, simple_loss=0.06913, pruned_loss=0.007781, audio_tagging_loss=0.007691, over 16008.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08918, pruned_loss=0.0119, audio_tagging_loss=0.008395, over 3056183.73 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:28:49,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3753406.6666666665, ans=0.125 2023-11-29 01:29:09,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.970e+01 9.713e+01 1.037e+02 1.358e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:29:10,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3753540.0, ans=0.0 2023-11-29 01:29:10,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3753540.0, ans=0.125 2023-11-29 01:29:14,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=22.5 2023-11-29 01:29:31,831 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-29 01:29:35,975 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9950, loss[loss=0.08397, simple_loss=0.1117, pruned_loss=0.01968, audio_tagging_loss=0.008422, over 14859.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08912, pruned_loss=0.0119, audio_tagging_loss=0.008428, over 3052386.92 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:29:44,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3753673.3333333335, ans=0.0 2023-11-29 01:30:04,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3753806.6666666665, ans=0.125 2023-11-29 01:30:33,869 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-29 01:30:34,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3753940.0, ans=0.125 2023-11-29 01:30:37,325 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10000, loss[loss=0.06739, simple_loss=0.09562, pruned_loss=0.009654, audio_tagging_loss=0.009928, over 15787.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08878, pruned_loss=0.01192, audio_tagging_loss=0.008481, over 3052629.16 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:31:02,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3754140.0, ans=0.1 2023-11-29 01:31:13,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 9.037e+01 9.619e+01 1.035e+02 1.339e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:31:30,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3754273.3333333335, ans=0.0 2023-11-29 01:31:32,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-29 01:31:35,294 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-29 01:31:39,215 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10050, loss[loss=0.06258, simple_loss=0.08916, pruned_loss=0.009654, audio_tagging_loss=0.008348, over 14885.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08974, pruned_loss=0.01204, audio_tagging_loss=0.008456, over 3052446.55 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:31:43,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-29 01:31:45,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-29 01:32:04,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3754473.3333333335, ans=0.0 2023-11-29 01:32:04,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-29 01:32:07,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3754473.3333333335, ans=0.1 2023-11-29 01:32:13,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-29 01:32:13,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3754473.3333333335, ans=0.0 2023-11-29 01:32:37,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-29 01:32:40,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3754673.3333333335, ans=0.125 2023-11-29 01:32:41,625 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10100, loss[loss=0.05912, simple_loss=0.0791, pruned_loss=0.01245, audio_tagging_loss=0.007121, over 15047.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08956, pruned_loss=0.01198, audio_tagging_loss=0.00852, over 3049575.63 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:33:17,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.162e+01 9.791e+01 1.075e+02 1.682e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:33:18,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3754873.3333333335, ans=0.1 2023-11-29 01:33:33,284 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:33:39,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-29 01:33:43,347 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10150, loss[loss=0.06102, simple_loss=0.08419, pruned_loss=0.009687, audio_tagging_loss=0.009239, over 15881.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09027, pruned_loss=0.01193, audio_tagging_loss=0.008557, over 3054622.30 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:33:45,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3755006.6666666665, ans=10.0 2023-11-29 01:33:52,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-29 01:33:58,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3755073.3333333335, ans=0.125 2023-11-29 01:34:02,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755073.3333333335, ans=0.1 2023-11-29 01:34:11,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3755140.0, ans=0.125 2023-11-29 01:34:13,476 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:34:14,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755140.0, ans=0.1 2023-11-29 01:34:26,692 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:34:40,705 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-29 01:34:44,792 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10200, loss[loss=0.06283, simple_loss=0.08277, pruned_loss=0.01183, audio_tagging_loss=0.009617, over 15599.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08968, pruned_loss=0.01178, audio_tagging_loss=0.008665, over 3057270.53 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:34:53,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-29 01:35:09,569 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:35:12,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3755473.3333333335, ans=0.0 2023-11-29 01:35:15,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-29 01:35:21,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 9.157e+01 9.642e+01 1.029e+02 1.501e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 01:35:27,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3755540.0, ans=0.125 2023-11-29 01:35:35,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-29 01:35:42,251 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-29 01:35:45,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3755673.3333333335, ans=0.125 2023-11-29 01:35:46,268 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10250, loss[loss=0.0555, simple_loss=0.07403, pruned_loss=0.007797, audio_tagging_loss=0.01068, over 15068.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08926, pruned_loss=0.01188, audio_tagging_loss=0.00866, over 3048457.38 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:35:46,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2023-11-29 01:36:43,820 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-29 01:36:47,556 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10300, loss[loss=0.07898, simple_loss=0.1071, pruned_loss=0.01815, audio_tagging_loss=0.007257, over 15457.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08909, pruned_loss=0.01188, audio_tagging_loss=0.008697, over 3050552.61 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:36:49,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3756006.6666666665, ans=0.125 2023-11-29 01:37:02,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3756073.3333333335, ans=0.07 2023-11-29 01:37:08,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3756073.3333333335, ans=0.0 2023-11-29 01:37:13,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3756140.0, ans=0.0 2023-11-29 01:37:13,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3756140.0, ans=0.0 2023-11-29 01:37:15,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3756140.0, ans=0.125 2023-11-29 01:37:25,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.072e+01 9.694e+01 1.048e+02 1.558e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 01:37:30,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3756206.6666666665, ans=0.125 2023-11-29 01:37:40,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-29 01:37:42,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3756273.3333333335, ans=0.2 2023-11-29 01:37:42,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3756273.3333333335, ans=0.2 2023-11-29 01:37:46,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-29 01:37:47,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-29 01:37:49,657 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10350, loss[loss=0.05874, simple_loss=0.07569, pruned_loss=0.01093, audio_tagging_loss=0.009962, over 16113.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09009, pruned_loss=0.01193, audio_tagging_loss=0.00873, over 3051492.38 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:37:55,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3756340.0, ans=0.2 2023-11-29 01:38:02,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3756406.6666666665, ans=0.0 2023-11-29 01:38:04,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3756406.6666666665, ans=0.0 2023-11-29 01:38:09,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3756406.6666666665, ans=0.125 2023-11-29 01:38:15,288 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:38:16,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3756473.3333333335, ans=0.0 2023-11-29 01:38:19,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.44 vs. limit=10.0 2023-11-29 01:38:19,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3756473.3333333335, ans=0.09899494936611666 2023-11-29 01:38:30,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-29 01:38:47,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-29 01:38:48,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3756606.6666666665, ans=0.0 2023-11-29 01:38:48,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2023-11-29 01:38:51,251 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10400, loss[loss=0.07005, simple_loss=0.08762, pruned_loss=0.01625, audio_tagging_loss=0.009991, over 15779.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08997, pruned_loss=0.01205, audio_tagging_loss=0.008793, over 3058341.87 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:38:52,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=15.0 2023-11-29 01:38:54,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3756673.3333333335, ans=0.125 2023-11-29 01:38:59,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3756673.3333333335, ans=10.0 2023-11-29 01:39:26,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3756806.6666666665, ans=0.125 2023-11-29 01:39:28,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.225e+01 9.627e+01 1.037e+02 1.431e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 01:39:41,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2023-11-29 01:39:46,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3756940.0, ans=0.125 2023-11-29 01:39:49,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-29 01:39:53,059 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10450, loss[loss=0.04485, simple_loss=0.05505, pruned_loss=0.005996, audio_tagging_loss=0.01133, over 15488.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08995, pruned_loss=0.01198, audio_tagging_loss=0.008737, over 3054992.28 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:40:42,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-29 01:40:49,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-29 01:40:54,490 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10500, loss[loss=0.08219, simple_loss=0.1115, pruned_loss=0.01936, audio_tagging_loss=0.007074, over 14184.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08973, pruned_loss=0.01193, audio_tagging_loss=0.008648, over 3053268.62 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:41:22,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3757473.3333333335, ans=0.0 2023-11-29 01:41:24,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-11-29 01:41:29,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3757473.3333333335, ans=0.125 2023-11-29 01:41:31,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.902e+01 9.602e+01 1.050e+02 1.360e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 01:41:49,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-29 01:41:52,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-29 01:41:53,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=22.5 2023-11-29 01:41:55,917 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10550, loss[loss=0.05467, simple_loss=0.07584, pruned_loss=0.009153, audio_tagging_loss=0.007597, over 14102.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09002, pruned_loss=0.01218, audio_tagging_loss=0.008501, over 3044827.56 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:42:05,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3757673.3333333335, ans=0.125 2023-11-29 01:42:27,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3757806.6666666665, ans=0.125 2023-11-29 01:42:34,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3757873.3333333335, ans=0.125 2023-11-29 01:42:54,240 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-29 01:42:54,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3757940.0, ans=0.125 2023-11-29 01:42:57,610 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10600, loss[loss=0.05478, simple_loss=0.07358, pruned_loss=0.01009, audio_tagging_loss=0.007899, over 14164.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09007, pruned_loss=0.0122, audio_tagging_loss=0.008445, over 3039656.02 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:43:23,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3758140.0, ans=0.125 2023-11-29 01:43:25,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3758140.0, ans=0.125 2023-11-29 01:43:25,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-11-29 01:43:34,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.127e+01 9.716e+01 1.043e+02 1.257e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:43:54,646 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-29 01:43:58,025 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10650, loss[loss=0.06864, simple_loss=0.09413, pruned_loss=0.01266, audio_tagging_loss=0.008921, over 15265.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09031, pruned_loss=0.01223, audio_tagging_loss=0.008394, over 3042736.88 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:44:18,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3758406.6666666665, ans=0.125 2023-11-29 01:44:18,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3758406.6666666665, ans=0.125 2023-11-29 01:44:28,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=6.0 2023-11-29 01:44:56,320 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-29 01:45:00,060 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10700, loss[loss=0.06205, simple_loss=0.08117, pruned_loss=0.01265, audio_tagging_loss=0.008812, over 15287.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08883, pruned_loss=0.01184, audio_tagging_loss=0.008478, over 3042762.96 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:45:01,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3758673.3333333335, ans=0.0 2023-11-29 01:45:15,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758740.0, ans=0.1 2023-11-29 01:45:17,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3758740.0, ans=0.125 2023-11-29 01:45:24,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3758806.6666666665, ans=0.2 2023-11-29 01:45:25,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3758806.6666666665, ans=0.125 2023-11-29 01:45:27,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2023-11-29 01:45:37,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.610e+01 9.369e+01 1.025e+02 1.277e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 01:45:58,447 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-29 01:46:01,875 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10750, loss[loss=0.06698, simple_loss=0.09409, pruned_loss=0.0119, audio_tagging_loss=0.008032, over 16452.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08768, pruned_loss=0.01167, audio_tagging_loss=0.008536, over 3042563.63 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:46:17,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3759073.3333333335, ans=0.0 2023-11-29 01:46:42,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:42,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:45,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2023-11-29 01:46:47,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:53,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3759273.3333333335, ans=0.125 2023-11-29 01:46:59,042 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-29 01:47:00,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3759273.3333333335, ans=0.1 2023-11-29 01:47:02,468 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10800, loss[loss=0.04158, simple_loss=0.05806, pruned_loss=0.004177, audio_tagging_loss=0.008379, over 16046.00 frames. ], tot_loss[loss=0.06372, simple_loss=0.08751, pruned_loss=0.01152, audio_tagging_loss=0.00844, over 3045960.44 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:47:19,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=22.5 2023-11-29 01:47:41,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.091e+01 9.540e+01 1.017e+02 1.841e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:47:45,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3759540.0, ans=0.2 2023-11-29 01:48:00,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-29 01:48:03,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3759673.3333333335, ans=0.125 2023-11-29 01:48:04,378 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10850, loss[loss=0.07775, simple_loss=0.1029, pruned_loss=0.01484, audio_tagging_loss=0.01148, over 15884.00 frames. ], tot_loss[loss=0.06365, simple_loss=0.08725, pruned_loss=0.01151, audio_tagging_loss=0.008523, over 3048044.34 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:48:10,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3759673.3333333335, ans=0.125 2023-11-29 01:48:40,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3759873.3333333335, ans=0.0 2023-11-29 01:48:50,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3759873.3333333335, ans=0.2 2023-11-29 01:48:51,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3759873.3333333335, ans=0.0 2023-11-29 01:49:02,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3759940.0, ans=0.125 2023-11-29 01:49:03,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-29 01:49:09,321 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:49:10,534 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10900, loss[loss=0.07097, simple_loss=0.1019, pruned_loss=0.01234, audio_tagging_loss=0.007695, over 15325.00 frames. ], tot_loss[loss=0.06414, simple_loss=0.08809, pruned_loss=0.01158, audio_tagging_loss=0.008509, over 3047473.59 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:49:35,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3760140.0, ans=0.125 2023-11-29 01:49:35,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3760140.0, ans=0.125 2023-11-29 01:49:39,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3760140.0, ans=0.125 2023-11-29 01:49:43,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3760140.0, ans=0.2 2023-11-29 01:49:48,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.098e+01 9.720e+01 1.048e+02 1.470e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 01:49:53,261 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:50:07,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-29 01:50:10,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3760340.0, ans=0.125 2023-11-29 01:50:11,353 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10950, loss[loss=0.0598, simple_loss=0.08824, pruned_loss=0.009471, audio_tagging_loss=0.006214, over 15584.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08893, pruned_loss=0.01183, audio_tagging_loss=0.008479, over 3056588.81 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:50:17,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3760340.0, ans=0.125 2023-11-29 01:50:22,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3760406.6666666665, ans=0.0 2023-11-29 01:50:27,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-29 01:50:47,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3760540.0, ans=0.0 2023-11-29 01:51:08,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-29 01:51:12,309 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11000, loss[loss=0.06509, simple_loss=0.0907, pruned_loss=0.01226, audio_tagging_loss=0.007485, over 15493.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08914, pruned_loss=0.01189, audio_tagging_loss=0.008523, over 3051783.13 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:51:19,527 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:51:23,471 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:51:35,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3760740.0, ans=0.0 2023-11-29 01:51:51,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.080e+01 9.821e+01 1.045e+02 1.365e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 01:51:59,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760873.3333333335, ans=0.1 2023-11-29 01:52:09,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-29 01:52:14,026 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11050, loss[loss=0.06444, simple_loss=0.08804, pruned_loss=0.01222, audio_tagging_loss=0.008197, over 15688.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08891, pruned_loss=0.01175, audio_tagging_loss=0.008639, over 3056637.66 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:52:39,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3761140.0, ans=0.125 2023-11-29 01:52:50,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-11-29 01:53:02,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3761206.6666666665, ans=0.035 2023-11-29 01:53:06,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-29 01:53:12,839 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-29 01:53:16,661 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11100, loss[loss=0.05541, simple_loss=0.08048, pruned_loss=0.008053, audio_tagging_loss=0.007116, over 15292.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08863, pruned_loss=0.01173, audio_tagging_loss=0.008751, over 3047945.72 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:53:56,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.017e+01 9.701e+01 1.046e+02 1.396e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 01:54:01,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3761540.0, ans=0.125 2023-11-29 01:54:11,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761606.6666666665, ans=0.1 2023-11-29 01:54:11,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3761606.6666666665, ans=0.2 2023-11-29 01:54:13,910 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-29 01:54:17,350 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11150, loss[loss=0.07624, simple_loss=0.1061, pruned_loss=0.01457, audio_tagging_loss=0.008596, over 15011.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08905, pruned_loss=0.01177, audio_tagging_loss=0.008835, over 3047883.60 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:54:17,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3761673.3333333335, ans=0.125 2023-11-29 01:54:24,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3761673.3333333335, ans=0.125 2023-11-29 01:55:03,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3761873.3333333335, ans=0.2 2023-11-29 01:55:03,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761873.3333333335, ans=0.1 2023-11-29 01:55:15,801 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-29 01:55:17,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3761940.0, ans=0.0 2023-11-29 01:55:19,852 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11200, loss[loss=0.07391, simple_loss=0.1069, pruned_loss=0.01304, audio_tagging_loss=0.007398, over 15957.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08836, pruned_loss=0.01171, audio_tagging_loss=0.008938, over 3044492.95 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:55:23,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3762006.6666666665, ans=0.2 2023-11-29 01:55:38,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3762073.3333333335, ans=0.0 2023-11-29 01:55:58,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.132e+01 8.938e+01 9.585e+01 1.042e+02 1.236e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:56:17,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-29 01:56:21,335 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11250, loss[loss=0.05367, simple_loss=0.07222, pruned_loss=0.01006, audio_tagging_loss=0.007495, over 15159.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08731, pruned_loss=0.01162, audio_tagging_loss=0.009012, over 3048060.54 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:56:25,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3762340.0, ans=0.0 2023-11-29 01:56:27,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3762340.0, ans=0.0 2023-11-29 01:56:40,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3762406.6666666665, ans=0.0 2023-11-29 01:56:58,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762540.0, ans=0.1 2023-11-29 01:56:59,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3762540.0, ans=0.125 2023-11-29 01:57:01,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3762540.0, ans=0.125 2023-11-29 01:57:06,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3762540.0, ans=0.09899494936611666 2023-11-29 01:57:07,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3762540.0, ans=0.0 2023-11-29 01:57:17,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3762606.6666666665, ans=0.125 2023-11-29 01:57:19,347 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-29 01:57:23,262 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11300, loss[loss=0.06932, simple_loss=0.1041, pruned_loss=0.01179, audio_tagging_loss=0.005459, over 16014.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08709, pruned_loss=0.01164, audio_tagging_loss=0.008847, over 3049988.34 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:57:39,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3762740.0, ans=0.125 2023-11-29 01:57:50,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=22.5 2023-11-29 01:58:02,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3762873.3333333335, ans=0.1 2023-11-29 01:58:04,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.082e+01 9.505e+01 1.023e+02 1.248e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 01:58:21,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-29 01:58:24,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3763006.6666666665, ans=0.125 2023-11-29 01:58:24,959 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11350, loss[loss=0.06178, simple_loss=0.08064, pruned_loss=0.01369, audio_tagging_loss=0.007774, over 15795.00 frames. ], tot_loss[loss=0.06381, simple_loss=0.08713, pruned_loss=0.01156, audio_tagging_loss=0.008683, over 3046124.96 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:58:42,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3763073.3333333335, ans=0.5 2023-11-29 01:58:48,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=22.5 2023-11-29 01:58:54,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3763140.0, ans=0.1 2023-11-29 01:58:57,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-29 01:59:04,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-11-29 01:59:05,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3763206.6666666665, ans=0.07 2023-11-29 01:59:05,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3763206.6666666665, ans=0.125 2023-11-29 01:59:16,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2023-11-29 01:59:20,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3763273.3333333335, ans=0.1 2023-11-29 01:59:21,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2023-11-29 01:59:22,740 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-29 01:59:26,234 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11400, loss[loss=0.04534, simple_loss=0.05799, pruned_loss=0.006006, audio_tagging_loss=0.01034, over 16704.00 frames. ], tot_loss[loss=0.06393, simple_loss=0.08741, pruned_loss=0.0116, audio_tagging_loss=0.00862, over 3045669.96 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:59:27,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3763340.0, ans=0.125 2023-11-29 01:59:28,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3763340.0, ans=0.125 2023-11-29 01:59:41,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3763406.6666666665, ans=0.0 2023-11-29 01:59:48,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3763406.6666666665, ans=0.125 2023-11-29 01:59:48,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3763406.6666666665, ans=0.0 2023-11-29 01:59:50,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-29 01:59:55,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-29 01:59:56,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3763473.3333333335, ans=0.1 2023-11-29 01:59:57,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-29 02:00:03,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3763540.0, ans=0.1 2023-11-29 02:00:06,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.845e+01 9.645e+01 1.031e+02 1.502e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:00:23,910 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-29 02:00:27,288 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11450, loss[loss=0.06381, simple_loss=0.08374, pruned_loss=0.012, audio_tagging_loss=0.009931, over 14466.00 frames. ], tot_loss[loss=0.06381, simple_loss=0.08714, pruned_loss=0.0116, audio_tagging_loss=0.008643, over 3040030.55 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:00:28,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3763673.3333333335, ans=0.2 2023-11-29 02:00:48,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3763740.0, ans=0.125 2023-11-29 02:00:59,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3763806.6666666665, ans=0.125 2023-11-29 02:01:10,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3763873.3333333335, ans=0.09899494936611666 2023-11-29 02:01:24,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-29 02:01:26,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3763940.0, ans=0.5 2023-11-29 02:01:28,642 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11500, loss[loss=0.05903, simple_loss=0.07262, pruned_loss=0.01134, audio_tagging_loss=0.01139, over 15197.00 frames. ], tot_loss[loss=0.0638, simple_loss=0.08726, pruned_loss=0.01155, audio_tagging_loss=0.008625, over 3049664.19 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:01:56,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3764140.0, ans=0.0 2023-11-29 02:02:09,665 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.942e+01 9.581e+01 1.022e+02 1.259e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 02:02:13,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3764206.6666666665, ans=0.2 2023-11-29 02:02:14,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.21 vs. limit=10.0 2023-11-29 02:02:19,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=12.0 2023-11-29 02:02:26,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3764273.3333333335, ans=0.0 2023-11-29 02:02:26,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3764273.3333333335, ans=0.07 2023-11-29 02:02:27,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-29 02:02:30,866 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11550, loss[loss=0.05151, simple_loss=0.06919, pruned_loss=0.009024, audio_tagging_loss=0.007894, over 15246.00 frames. ], tot_loss[loss=0.06363, simple_loss=0.0873, pruned_loss=0.01146, audio_tagging_loss=0.008516, over 3051806.34 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:02:32,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3764340.0, ans=0.125 2023-11-29 02:02:48,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3764406.6666666665, ans=0.125 2023-11-29 02:03:02,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-29 02:03:06,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3764540.0, ans=0.125 2023-11-29 02:03:09,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3764540.0, ans=0.1 2023-11-29 02:03:10,454 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:03:28,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-29 02:03:29,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3764606.6666666665, ans=0.125 2023-11-29 02:03:32,101 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11600, loss[loss=0.09122, simple_loss=0.1332, pruned_loss=0.01837, audio_tagging_loss=0.006264, over 16403.00 frames. ], tot_loss[loss=0.06382, simple_loss=0.08754, pruned_loss=0.01158, audio_tagging_loss=0.00847, over 3051839.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:03:44,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3764740.0, ans=0.0 2023-11-29 02:03:44,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-29 02:04:13,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 8.912e+01 9.637e+01 1.036e+02 1.418e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 02:04:22,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3764940.0, ans=0.125 2023-11-29 02:04:24,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764940.0, ans=0.1 2023-11-29 02:04:27,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3764940.0, ans=0.125 2023-11-29 02:04:29,751 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-29 02:04:33,231 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11650, loss[loss=0.04633, simple_loss=0.05692, pruned_loss=0.007475, audio_tagging_loss=0.01039, over 14578.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.0891, pruned_loss=0.01184, audio_tagging_loss=0.00847, over 3046137.33 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:04:38,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3765006.6666666665, ans=0.2 2023-11-29 02:04:52,214 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:04:53,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765073.3333333335, ans=0.1 2023-11-29 02:05:01,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-29 02:05:21,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3765273.3333333335, ans=0.015 2023-11-29 02:05:24,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3765273.3333333335, ans=0.0 2023-11-29 02:05:31,000 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-29 02:05:34,750 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11700, loss[loss=0.07185, simple_loss=0.09504, pruned_loss=0.01285, audio_tagging_loss=0.01148, over 15413.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08844, pruned_loss=0.01186, audio_tagging_loss=0.008551, over 3053126.10 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:05:38,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765340.0, ans=0.1 2023-11-29 02:05:39,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3765340.0, ans=0.1 2023-11-29 02:05:52,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3765406.6666666665, ans=0.125 2023-11-29 02:06:16,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.764e+01 9.502e+01 1.022e+02 1.260e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:06:17,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2023-11-29 02:06:19,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.44 vs. limit=10.0 2023-11-29 02:06:21,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3765540.0, ans=0.1 2023-11-29 02:06:22,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3765606.6666666665, ans=0.0 2023-11-29 02:06:32,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-29 02:06:35,542 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11750, loss[loss=0.07717, simple_loss=0.1075, pruned_loss=0.01517, audio_tagging_loss=0.008246, over 14297.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08927, pruned_loss=0.01191, audio_tagging_loss=0.00854, over 3045236.06 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:06:51,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3765740.0, ans=0.125 2023-11-29 02:07:01,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-29 02:07:24,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3765940.0, ans=0.2 2023-11-29 02:07:33,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-29 02:07:37,662 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11800, loss[loss=0.06649, simple_loss=0.08309, pruned_loss=0.01228, audio_tagging_loss=0.01267, over 14521.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08842, pruned_loss=0.01182, audio_tagging_loss=0.008598, over 3046289.39 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:07:40,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3766006.6666666665, ans=0.2 2023-11-29 02:07:41,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3766006.6666666665, ans=0.125 2023-11-29 02:07:42,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3766006.6666666665, ans=0.125 2023-11-29 02:08:17,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.243e+01 9.839e+01 1.045e+02 1.336e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 02:08:35,188 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-29 02:08:38,653 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11850, loss[loss=0.0768, simple_loss=0.1053, pruned_loss=0.01252, audio_tagging_loss=0.01162, over 15146.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.088, pruned_loss=0.01183, audio_tagging_loss=0.008666, over 3040634.60 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:08:38,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3766340.0, ans=0.125 2023-11-29 02:08:58,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3766406.6666666665, ans=0.125 2023-11-29 02:09:26,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3766606.6666666665, ans=0.0 2023-11-29 02:09:28,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3766606.6666666665, ans=0.025 2023-11-29 02:09:34,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-29 02:09:38,742 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11900, loss[loss=0.05725, simple_loss=0.07688, pruned_loss=0.008489, audio_tagging_loss=0.01033, over 16006.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0886, pruned_loss=0.01184, audio_tagging_loss=0.0087, over 3041732.09 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:09:41,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.67 vs. limit=10.0 2023-11-29 02:09:58,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-11-29 02:10:02,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3766806.6666666665, ans=0.5 2023-11-29 02:10:06,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2023-11-29 02:10:13,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-29 02:10:19,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.853e+01 9.543e+01 1.009e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 02:10:34,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-29 02:10:37,866 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11950, loss[loss=0.03632, simple_loss=0.04239, pruned_loss=0.004659, audio_tagging_loss=0.01046, over 16319.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08786, pruned_loss=0.01173, audio_tagging_loss=0.008895, over 3040954.73 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:10:41,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3767006.6666666665, ans=0.0 2023-11-29 02:10:45,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3767006.6666666665, ans=0.2 2023-11-29 02:10:46,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3767006.6666666665, ans=0.125 2023-11-29 02:10:58,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-29 02:11:00,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3767073.3333333335, ans=10.0 2023-11-29 02:11:12,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3767140.0, ans=0.09899494936611666 2023-11-29 02:11:22,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3767206.6666666665, ans=0.125 2023-11-29 02:11:33,265 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-29 02:11:34,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3767273.3333333335, ans=0.125 2023-11-29 02:11:36,584 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 12000, loss[loss=0.08786, simple_loss=0.1234, pruned_loss=0.01997, audio_tagging_loss=0.006176, over 15742.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08812, pruned_loss=0.0118, audio_tagging_loss=0.008898, over 3044434.10 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:11:36,585 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 02:12:01,459 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6153, 3.7070, 4.0212, 3.5041], device='cuda:3') 2023-11-29 02:12:16,764 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05799, simple_loss=0.0505, pruned_loss=0.005391, audio_tagging_loss=0.02735, over 4681554.00 frames. 2023-11-29 02:12:16,765 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 02:12:33,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-11-29 02:13:00,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-29 02:13:01,511 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 0, loss[loss=0.0631, simple_loss=0.06999, pruned_loss=0.008342, audio_tagging_loss=0.01976, over 15043.00 frames. ], tot_loss[loss=0.0631, simple_loss=0.06999, pruned_loss=0.008342, audio_tagging_loss=0.01976, over 15043.00 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:13:01,512 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 02:13:36,852 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05814, simple_loss=0.05045, pruned_loss=0.005317, audio_tagging_loss=0.02759, over 4681554.00 frames. 2023-11-29 02:13:36,853 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 02:13:43,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3767493.3333333335, ans=0.0 2023-11-29 02:13:50,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.361e+01 1.012e+02 1.115e+02 1.422e+02, threshold=2.023e+02, percent-clipped=0.0 2023-11-29 02:13:57,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3767560.0, ans=0.125 2023-11-29 02:14:08,484 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-29 02:14:32,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3767760.0, ans=0.0 2023-11-29 02:14:40,289 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 50, loss[loss=0.07632, simple_loss=0.1009, pruned_loss=0.01304, audio_tagging_loss=0.0128, over 15226.00 frames. ], tot_loss[loss=0.07334, simple_loss=0.08991, pruned_loss=0.01217, audio_tagging_loss=0.01622, over 683062.57 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:09,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3767960.0, ans=0.2 2023-11-29 02:15:10,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-29 02:15:16,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3768026.6666666665, ans=0.1 2023-11-29 02:15:36,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3768093.3333333335, ans=0.5 2023-11-29 02:15:40,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3768093.3333333335, ans=0.125 2023-11-29 02:15:43,429 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 100, loss[loss=0.06436, simple_loss=0.08197, pruned_loss=0.01127, audio_tagging_loss=0.01211, over 16767.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.09096, pruned_loss=0.01207, audio_tagging_loss=0.01557, over 1208161.55 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:44,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3768160.0, ans=0.125 2023-11-29 02:15:49,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3768160.0, ans=0.5 2023-11-29 02:15:56,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.236e+01 9.896e+01 1.062e+02 1.155e+02 1.316e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-29 02:15:57,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3768226.6666666665, ans=0.125 2023-11-29 02:16:02,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3768226.6666666665, ans=0.125 2023-11-29 02:16:10,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3768293.3333333335, ans=0.125 2023-11-29 02:16:12,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-29 02:16:21,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3768360.0, ans=0.0 2023-11-29 02:16:32,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3768426.6666666665, ans=0.2 2023-11-29 02:16:35,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-11-29 02:16:36,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768426.6666666665, ans=0.1 2023-11-29 02:16:43,669 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 150, loss[loss=0.07009, simple_loss=0.09988, pruned_loss=0.01147, audio_tagging_loss=0.008677, over 15386.00 frames. ], tot_loss[loss=0.07171, simple_loss=0.09151, pruned_loss=0.01197, audio_tagging_loss=0.01398, over 1617902.29 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:16:46,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3768493.3333333335, ans=0.125 2023-11-29 02:16:53,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3768493.3333333335, ans=0.0 2023-11-29 02:16:53,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3768493.3333333335, ans=15.0 2023-11-29 02:16:58,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3768560.0, ans=0.0 2023-11-29 02:17:02,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3768560.0, ans=0.1 2023-11-29 02:17:07,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3768626.6666666665, ans=0.09899494936611666 2023-11-29 02:17:14,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-29 02:17:46,576 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 200, loss[loss=0.06167, simple_loss=0.08467, pruned_loss=0.01117, audio_tagging_loss=0.00816, over 16282.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.0899, pruned_loss=0.01181, audio_tagging_loss=0.01241, over 1940456.98 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:02,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.150e+01 9.879e+01 1.074e+02 1.273e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 02:18:16,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-29 02:18:18,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2023-11-29 02:18:49,017 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 250, loss[loss=0.06409, simple_loss=0.08895, pruned_loss=0.007147, audio_tagging_loss=0.01247, over 14478.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.08854, pruned_loss=0.01166, audio_tagging_loss=0.01142, over 2190972.07 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:52,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:18:52,223 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:18:53,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-29 02:19:18,397 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-29 02:19:34,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3769360.0, ans=0.125 2023-11-29 02:19:45,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3769426.6666666665, ans=0.125 2023-11-29 02:19:51,083 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 300, loss[loss=0.06895, simple_loss=0.09908, pruned_loss=0.01102, audio_tagging_loss=0.008385, over 14407.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.08926, pruned_loss=0.01188, audio_tagging_loss=0.01056, over 2378854.76 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:19:57,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-29 02:19:58,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3769493.3333333335, ans=0.0 2023-11-29 02:20:03,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2023-11-29 02:20:05,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.202e+01 9.824e+01 1.066e+02 1.297e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 02:20:14,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3769626.6666666665, ans=0.125 2023-11-29 02:20:19,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-29 02:20:35,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3769693.3333333335, ans=0.0 2023-11-29 02:20:51,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3769826.6666666665, ans=0.125 2023-11-29 02:20:52,703 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 350, loss[loss=0.05043, simple_loss=0.07066, pruned_loss=0.006379, audio_tagging_loss=0.00872, over 15714.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08941, pruned_loss=0.01183, audio_tagging_loss=0.01, over 2528406.44 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:21:22,214 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-29 02:21:31,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3770026.6666666665, ans=0.125 2023-11-29 02:21:53,364 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 400, loss[loss=0.08085, simple_loss=0.1111, pruned_loss=0.01643, audio_tagging_loss=0.008848, over 15769.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08903, pruned_loss=0.01172, audio_tagging_loss=0.009609, over 2646817.10 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:22:09,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.956e+01 9.429e+01 1.009e+02 1.369e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:22:09,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3770226.6666666665, ans=0.04949747468305833 2023-11-29 02:22:09,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:09,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:23,910 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-29 02:22:41,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3770360.0, ans=0.125 2023-11-29 02:22:43,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3770426.6666666665, ans=0.125 2023-11-29 02:22:50,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3770426.6666666665, ans=0.125 2023-11-29 02:22:56,241 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 450, loss[loss=0.06745, simple_loss=0.09762, pruned_loss=0.01161, audio_tagging_loss=0.007033, over 15272.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08953, pruned_loss=0.01191, audio_tagging_loss=0.009363, over 2733429.97 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:22:58,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2023-11-29 02:23:01,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3770493.3333333335, ans=0.0 2023-11-29 02:23:09,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3770560.0, ans=0.125 2023-11-29 02:23:23,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3770626.6666666665, ans=10.0 2023-11-29 02:23:24,850 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-29 02:23:39,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3770693.3333333335, ans=0.1 2023-11-29 02:23:57,788 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 500, loss[loss=0.07811, simple_loss=0.1044, pruned_loss=0.01647, audio_tagging_loss=0.009442, over 15424.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08883, pruned_loss=0.01179, audio_tagging_loss=0.009235, over 2799016.24 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:24:02,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3770826.6666666665, ans=0.125 2023-11-29 02:24:10,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3770893.3333333335, ans=0.125 2023-11-29 02:24:12,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.944e+01 9.480e+01 1.026e+02 1.531e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 02:24:17,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-29 02:24:21,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:24,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3770960.0, ans=0.1 2023-11-29 02:24:26,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-29 02:24:27,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3770960.0, ans=0.2 2023-11-29 02:24:28,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:31,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:35,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3771026.6666666665, ans=0.2 2023-11-29 02:24:38,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3771026.6666666665, ans=0.125 2023-11-29 02:24:48,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3771093.3333333335, ans=0.125 2023-11-29 02:24:56,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3771093.3333333335, ans=0.1 2023-11-29 02:24:58,596 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 550, loss[loss=0.07143, simple_loss=0.0851, pruned_loss=0.01665, audio_tagging_loss=0.01223, over 15634.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08963, pruned_loss=0.01194, audio_tagging_loss=0.009071, over 2848386.17 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:25:00,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3771160.0, ans=0.125 2023-11-29 02:25:28,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-29 02:25:40,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3771360.0, ans=0.2 2023-11-29 02:25:58,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-11-29 02:26:00,423 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 600, loss[loss=0.05305, simple_loss=0.07569, pruned_loss=0.007531, audio_tagging_loss=0.007671, over 15066.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08949, pruned_loss=0.01201, audio_tagging_loss=0.009011, over 2890052.17 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:26:03,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3771493.3333333335, ans=0.2 2023-11-29 02:26:07,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3771493.3333333335, ans=0.09899494936611666 2023-11-29 02:26:16,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.960e+01 9.657e+01 1.065e+02 1.783e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:26:26,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3771626.6666666665, ans=0.0 2023-11-29 02:26:30,066 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-29 02:26:34,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=12.0 2023-11-29 02:26:49,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3771760.0, ans=0.125 2023-11-29 02:27:02,374 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 650, loss[loss=0.04589, simple_loss=0.06388, pruned_loss=0.004876, audio_tagging_loss=0.00908, over 15953.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08945, pruned_loss=0.01198, audio_tagging_loss=0.008881, over 2925197.36 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:27:19,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-11-29 02:27:31,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-29 02:27:40,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-29 02:27:50,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3772093.3333333335, ans=0.125 2023-11-29 02:28:03,559 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 700, loss[loss=0.04443, simple_loss=0.06253, pruned_loss=0.006651, audio_tagging_loss=0.006514, over 15396.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08887, pruned_loss=0.01178, audio_tagging_loss=0.008871, over 2948062.88 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:28:09,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3772160.0, ans=0.125 2023-11-29 02:28:16,102 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:28:16,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3772226.6666666665, ans=0.05 2023-11-29 02:28:19,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.910e+01 9.498e+01 1.006e+02 1.347e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:28:23,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3772226.6666666665, ans=0.2 2023-11-29 02:28:32,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-29 02:28:42,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3772360.0, ans=0.5 2023-11-29 02:29:04,550 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 750, loss[loss=0.06098, simple_loss=0.08911, pruned_loss=0.007181, audio_tagging_loss=0.009244, over 14388.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08878, pruned_loss=0.01175, audio_tagging_loss=0.008847, over 2970179.88 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:29:18,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3772560.0, ans=0.125 2023-11-29 02:29:24,522 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:29:33,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-29 02:29:33,858 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-29 02:29:44,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3772693.3333333335, ans=0.0 2023-11-29 02:29:52,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3772760.0, ans=0.2 2023-11-29 02:29:53,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3772760.0, ans=0.2 2023-11-29 02:29:59,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2023-11-29 02:30:05,609 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 800, loss[loss=0.06623, simple_loss=0.09303, pruned_loss=0.01026, audio_tagging_loss=0.009454, over 15698.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08947, pruned_loss=0.01181, audio_tagging_loss=0.008807, over 2991376.92 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:30:21,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.034e+01 9.772e+01 1.029e+02 1.331e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:30:32,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=12.0 2023-11-29 02:30:35,162 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-29 02:30:43,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3773026.6666666665, ans=0.125 2023-11-29 02:30:46,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-29 02:30:50,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3773026.6666666665, ans=0.07 2023-11-29 02:30:50,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3773026.6666666665, ans=0.0 2023-11-29 02:31:06,984 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 850, loss[loss=0.06439, simple_loss=0.08887, pruned_loss=0.01178, audio_tagging_loss=0.008172, over 15035.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08995, pruned_loss=0.01185, audio_tagging_loss=0.008778, over 3005814.83 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:31:34,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3773293.3333333335, ans=0.0 2023-11-29 02:31:36,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-29 02:31:48,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-29 02:31:51,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773360.0, ans=0.1 2023-11-29 02:32:03,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-29 02:32:09,883 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 900, loss[loss=0.1132, simple_loss=0.1715, pruned_loss=0.02252, audio_tagging_loss=0.004959, over 16182.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09031, pruned_loss=0.012, audio_tagging_loss=0.008824, over 3015767.84 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:32:10,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-29 02:32:16,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3773493.3333333335, ans=0.0 2023-11-29 02:32:18,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3773493.3333333335, ans=0.2 2023-11-29 02:32:26,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 9.124e+01 9.810e+01 1.032e+02 1.259e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 02:32:39,586 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-29 02:32:48,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3773693.3333333335, ans=0.2 2023-11-29 02:33:03,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3773760.0, ans=0.125 2023-11-29 02:33:11,571 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 950, loss[loss=0.06182, simple_loss=0.0906, pruned_loss=0.009326, audio_tagging_loss=0.007194, over 15687.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08981, pruned_loss=0.01183, audio_tagging_loss=0.008751, over 3022929.65 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:33:26,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-29 02:33:31,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=3773893.3333333335, ans=12.0 2023-11-29 02:33:40,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-29 02:33:42,034 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-29 02:33:42,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3773960.0, ans=0.125 2023-11-29 02:33:42,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3773960.0, ans=0.0 2023-11-29 02:33:43,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3773960.0, ans=0.125 2023-11-29 02:34:06,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3774093.3333333335, ans=0.1 2023-11-29 02:34:13,535 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1000, loss[loss=0.05032, simple_loss=0.0631, pruned_loss=0.008082, audio_tagging_loss=0.01069, over 14971.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08859, pruned_loss=0.01167, audio_tagging_loss=0.008647, over 3022047.21 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:34:19,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3774160.0, ans=0.125 2023-11-29 02:34:29,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-29 02:34:30,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.000e+01 9.678e+01 1.023e+02 1.395e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 02:34:38,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3774293.3333333335, ans=0.125 2023-11-29 02:34:41,581 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:34:42,748 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-29 02:35:13,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3774426.6666666665, ans=0.95 2023-11-29 02:35:15,266 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1050, loss[loss=0.05978, simple_loss=0.08046, pruned_loss=0.01191, audio_tagging_loss=0.00764, over 15300.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08822, pruned_loss=0.01161, audio_tagging_loss=0.008545, over 3036264.21 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:35:26,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:27,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-29 02:35:29,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3774560.0, ans=0.05 2023-11-29 02:35:43,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=12.0 2023-11-29 02:35:44,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-29 02:35:44,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2023-11-29 02:35:49,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3774626.6666666665, ans=0.1 2023-11-29 02:36:01,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3774693.3333333335, ans=0.2 2023-11-29 02:36:10,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3774760.0, ans=0.0 2023-11-29 02:36:16,932 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1100, loss[loss=0.0607, simple_loss=0.07787, pruned_loss=0.01355, audio_tagging_loss=0.00822, over 13991.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08939, pruned_loss=0.01183, audio_tagging_loss=0.008467, over 3036971.65 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:36:17,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3774826.6666666665, ans=0.05 2023-11-29 02:36:21,669 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:36:33,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-29 02:36:34,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.921e+01 9.429e+01 9.964e+01 1.346e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:36:42,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3774960.0, ans=10.0 2023-11-29 02:36:47,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-29 02:37:19,254 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1150, loss[loss=0.06011, simple_loss=0.08976, pruned_loss=0.007334, audio_tagging_loss=0.007891, over 15040.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08898, pruned_loss=0.01171, audio_tagging_loss=0.008449, over 3037958.27 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:37:49,326 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-29 02:38:21,988 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1200, loss[loss=0.06036, simple_loss=0.07727, pruned_loss=0.0121, audio_tagging_loss=0.009626, over 14901.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08992, pruned_loss=0.01199, audio_tagging_loss=0.008438, over 3037559.91 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:38:39,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.292e+01 9.106e+01 9.655e+01 1.032e+02 1.347e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:38:51,445 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-29 02:39:01,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3775693.3333333335, ans=0.2 2023-11-29 02:39:03,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3775693.3333333335, ans=0.0 2023-11-29 02:39:12,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3775760.0, ans=0.2 2023-11-29 02:39:23,534 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1250, loss[loss=0.08521, simple_loss=0.1286, pruned_loss=0.01273, audio_tagging_loss=0.008151, over 15555.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08862, pruned_loss=0.01182, audio_tagging_loss=0.008402, over 3030731.97 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:39:51,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3775960.0, ans=0.025 2023-11-29 02:39:53,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-29 02:39:56,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3775960.0, ans=0.125 2023-11-29 02:40:04,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3776026.6666666665, ans=0.125 2023-11-29 02:40:07,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3776026.6666666665, ans=0.0 2023-11-29 02:40:09,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.44 vs. limit=10.0 2023-11-29 02:40:25,322 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1300, loss[loss=0.05699, simple_loss=0.08003, pruned_loss=0.01015, audio_tagging_loss=0.006825, over 15710.00 frames. ], tot_loss[loss=0.06359, simple_loss=0.08711, pruned_loss=0.01154, audio_tagging_loss=0.008495, over 3032272.60 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:40:31,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=22.5 2023-11-29 02:40:39,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2023-11-29 02:40:43,993 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.895e+01 9.443e+01 1.023e+02 1.246e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-29 02:40:46,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3776226.6666666665, ans=0.0 2023-11-29 02:40:47,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3776226.6666666665, ans=0.0 2023-11-29 02:40:55,063 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-29 02:41:11,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-11-29 02:41:24,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3776426.6666666665, ans=0.125 2023-11-29 02:41:26,028 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1350, loss[loss=0.08059, simple_loss=0.1063, pruned_loss=0.01856, audio_tagging_loss=0.008862, over 14629.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08793, pruned_loss=0.01163, audio_tagging_loss=0.00843, over 3023098.11 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:41:53,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3776626.6666666665, ans=0.2 2023-11-29 02:41:56,979 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-29 02:42:10,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3776693.3333333335, ans=0.125 2023-11-29 02:42:13,868 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:42:14,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3776693.3333333335, ans=0.1 2023-11-29 02:42:22,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2023-11-29 02:42:23,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-29 02:42:23,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3776760.0, ans=0.125 2023-11-29 02:42:29,815 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1400, loss[loss=0.05364, simple_loss=0.07012, pruned_loss=0.008085, audio_tagging_loss=0.01049, over 16457.00 frames. ], tot_loss[loss=0.06351, simple_loss=0.08693, pruned_loss=0.01148, audio_tagging_loss=0.008558, over 3028011.30 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:42:30,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776826.6666666665, ans=0.1 2023-11-29 02:42:39,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3776826.6666666665, ans=0.125 2023-11-29 02:42:47,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.922e+01 9.372e+01 1.016e+02 1.403e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 02:42:58,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-29 02:43:02,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776960.0, ans=0.1 2023-11-29 02:43:14,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3777026.6666666665, ans=0.1 2023-11-29 02:43:25,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3777093.3333333335, ans=0.2 2023-11-29 02:43:30,427 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1450, loss[loss=0.05824, simple_loss=0.07796, pruned_loss=0.00908, audio_tagging_loss=0.01018, over 15891.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08844, pruned_loss=0.01174, audio_tagging_loss=0.008589, over 3025878.06 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:43:42,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3777226.6666666665, ans=0.1 2023-11-29 02:43:44,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3777226.6666666665, ans=0.125 2023-11-29 02:43:52,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3777226.6666666665, ans=0.2 2023-11-29 02:44:00,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-29 02:44:10,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3777360.0, ans=0.04949747468305833 2023-11-29 02:44:32,077 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1500, loss[loss=0.04236, simple_loss=0.05521, pruned_loss=0.007228, audio_tagging_loss=0.007529, over 14554.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08804, pruned_loss=0.01163, audio_tagging_loss=0.008702, over 3030880.51 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:44:33,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3777493.3333333335, ans=0.0 2023-11-29 02:44:45,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3777560.0, ans=0.2 2023-11-29 02:44:51,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.184e+01 9.950e+01 1.078e+02 1.281e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 02:44:54,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-11-29 02:44:59,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-29 02:45:00,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-29 02:45:01,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3777626.6666666665, ans=0.125 2023-11-29 02:45:02,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-29 02:45:09,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2023-11-29 02:45:11,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3777693.3333333335, ans=0.125 2023-11-29 02:45:16,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3777693.3333333335, ans=0.0 2023-11-29 02:45:19,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3777760.0, ans=0.125 2023-11-29 02:45:30,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-29 02:45:34,471 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1550, loss[loss=0.08089, simple_loss=0.1146, pruned_loss=0.01716, audio_tagging_loss=0.006434, over 15308.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08927, pruned_loss=0.01186, audio_tagging_loss=0.008615, over 3036213.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:45:34,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-29 02:45:41,944 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:46:02,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3777960.0, ans=0.125 2023-11-29 02:46:03,607 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-29 02:46:13,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:46:23,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2023-11-29 02:46:36,487 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1600, loss[loss=0.07482, simple_loss=0.0991, pruned_loss=0.01649, audio_tagging_loss=0.00878, over 15023.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08977, pruned_loss=0.01197, audio_tagging_loss=0.008637, over 3042930.80 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:46:40,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-11-29 02:46:42,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778160.0, ans=0.1 2023-11-29 02:46:45,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=12.0 2023-11-29 02:46:46,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3778160.0, ans=0.1 2023-11-29 02:46:46,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3778160.0, ans=0.125 2023-11-29 02:46:54,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.129e+01 9.735e+01 1.042e+02 2.046e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 02:46:57,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3778226.6666666665, ans=0.0 2023-11-29 02:47:06,785 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-29 02:47:27,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3778426.6666666665, ans=0.125 2023-11-29 02:47:36,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3778493.3333333335, ans=0.0 2023-11-29 02:47:37,581 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1650, loss[loss=0.05182, simple_loss=0.07227, pruned_loss=0.006831, audio_tagging_loss=0.008853, over 15543.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08934, pruned_loss=0.01187, audio_tagging_loss=0.008689, over 3036131.44 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:47:39,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-29 02:47:45,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3778493.3333333335, ans=0.125 2023-11-29 02:48:05,507 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:48:07,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-29 02:48:20,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3778693.3333333335, ans=0.125 2023-11-29 02:48:40,036 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1700, loss[loss=0.0543, simple_loss=0.08158, pruned_loss=0.006296, audio_tagging_loss=0.007219, over 15591.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08954, pruned_loss=0.01194, audio_tagging_loss=0.008642, over 3038611.75 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:48:49,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2023-11-29 02:48:57,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3778893.3333333335, ans=0.2 2023-11-29 02:48:58,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 9.116e+01 9.697e+01 1.043e+02 1.617e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 02:49:02,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3778893.3333333335, ans=0.125 2023-11-29 02:49:09,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-29 02:49:15,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3779026.6666666665, ans=0.025 2023-11-29 02:49:21,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3779026.6666666665, ans=0.125 2023-11-29 02:49:21,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3779026.6666666665, ans=0.04949747468305833 2023-11-29 02:49:36,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3779093.3333333335, ans=0.0 2023-11-29 02:49:38,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=17.67 vs. limit=15.0 2023-11-29 02:49:41,229 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1750, loss[loss=0.06048, simple_loss=0.08096, pruned_loss=0.01119, audio_tagging_loss=0.008814, over 15921.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08789, pruned_loss=0.01159, audio_tagging_loss=0.008683, over 3039106.43 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:50:11,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-29 02:50:41,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3779426.6666666665, ans=0.125 2023-11-29 02:50:43,317 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1800, loss[loss=0.08532, simple_loss=0.1178, pruned_loss=0.02027, audio_tagging_loss=0.00616, over 15483.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08882, pruned_loss=0.01179, audio_tagging_loss=0.008556, over 3039742.93 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:50:50,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3779493.3333333335, ans=0.125 2023-11-29 02:51:02,037 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 9.130e+01 9.771e+01 1.053e+02 1.389e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:51:13,241 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-29 02:51:23,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3779693.3333333335, ans=0.0 2023-11-29 02:51:45,197 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1850, loss[loss=0.05859, simple_loss=0.06703, pruned_loss=0.01509, audio_tagging_loss=0.009989, over 13789.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08887, pruned_loss=0.01194, audio_tagging_loss=0.008536, over 3033865.35 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:51:46,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3779826.6666666665, ans=0.125 2023-11-29 02:51:53,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=22.5 2023-11-29 02:52:15,095 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-29 02:52:47,569 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1900, loss[loss=0.06623, simple_loss=0.09478, pruned_loss=0.01163, audio_tagging_loss=0.007212, over 16007.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.0882, pruned_loss=0.01182, audio_tagging_loss=0.008492, over 3040539.77 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:02,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=22.5 2023-11-29 02:53:06,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.033e+01 9.601e+01 1.005e+02 1.271e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 02:53:10,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-29 02:53:17,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-29 02:53:32,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-11-29 02:53:39,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3780426.6666666665, ans=0.125 2023-11-29 02:53:43,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3780426.6666666665, ans=0.2 2023-11-29 02:53:49,020 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1950, loss[loss=0.05283, simple_loss=0.07068, pruned_loss=0.008965, audio_tagging_loss=0.008518, over 14436.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08808, pruned_loss=0.0118, audio_tagging_loss=0.008445, over 3041513.45 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:50,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-29 02:53:57,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-29 02:54:11,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3780560.0, ans=0.0 2023-11-29 02:54:13,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3780626.6666666665, ans=0.125 2023-11-29 02:54:18,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-29 02:54:51,073 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2000, loss[loss=0.06491, simple_loss=0.09157, pruned_loss=0.01309, audio_tagging_loss=0.006028, over 14404.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08824, pruned_loss=0.01185, audio_tagging_loss=0.008492, over 3042803.87 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:03,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-29 02:55:03,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3780893.3333333335, ans=0.09899494936611666 2023-11-29 02:55:10,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.857e+01 9.495e+01 1.044e+02 1.385e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 02:55:20,820 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-29 02:55:20,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3780960.0, ans=0.0 2023-11-29 02:55:21,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3780960.0, ans=0.125 2023-11-29 02:55:30,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-29 02:55:31,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3781026.6666666665, ans=0.125 2023-11-29 02:55:37,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3781026.6666666665, ans=0.0 2023-11-29 02:55:40,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3781093.3333333335, ans=0.0 2023-11-29 02:55:40,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2023-11-29 02:55:52,033 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2050, loss[loss=0.06758, simple_loss=0.09391, pruned_loss=0.01352, audio_tagging_loss=0.00711, over 15566.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08914, pruned_loss=0.01204, audio_tagging_loss=0.008448, over 3039717.85 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:59,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781160.0, ans=0.1 2023-11-29 02:56:21,394 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-29 02:56:25,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3781293.3333333335, ans=0.2 2023-11-29 02:56:46,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3781426.6666666665, ans=0.0 2023-11-29 02:56:51,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3781426.6666666665, ans=0.5 2023-11-29 02:56:53,767 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2100, loss[loss=0.08246, simple_loss=0.125, pruned_loss=0.01438, audio_tagging_loss=0.005555, over 15212.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08928, pruned_loss=0.01194, audio_tagging_loss=0.008408, over 3038985.34 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:56:56,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3781493.3333333335, ans=0.125 2023-11-29 02:57:13,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 9.043e+01 9.646e+01 1.030e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:57:16,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3781560.0, ans=0.125 2023-11-29 02:57:23,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-29 02:57:32,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781693.3333333335, ans=0.1 2023-11-29 02:57:43,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3781760.0, ans=0.125 2023-11-29 02:57:45,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3781760.0, ans=0.09899494936611666 2023-11-29 02:57:45,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3781760.0, ans=0.0 2023-11-29 02:57:55,478 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2150, loss[loss=0.06507, simple_loss=0.09162, pruned_loss=0.01196, audio_tagging_loss=0.007295, over 14787.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08956, pruned_loss=0.012, audio_tagging_loss=0.008478, over 3038870.49 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:58:00,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2023-11-29 02:58:15,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-29 02:58:25,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-29 02:58:28,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2023-11-29 02:58:31,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-29 02:58:34,683 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:58:36,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2023-11-29 02:58:46,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3782093.3333333335, ans=0.125 2023-11-29 02:58:51,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3782093.3333333335, ans=0.0 2023-11-29 02:58:56,771 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2200, loss[loss=0.0669, simple_loss=0.0846, pruned_loss=0.01319, audio_tagging_loss=0.01141, over 14226.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08931, pruned_loss=0.01186, audio_tagging_loss=0.008563, over 3037393.27 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:58:56,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3782160.0, ans=0.125 2023-11-29 02:59:16,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.088e+01 9.723e+01 1.043e+02 1.467e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 02:59:19,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3782226.6666666665, ans=0.09899494936611666 2023-11-29 02:59:26,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-29 02:59:40,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3782360.0, ans=0.2 2023-11-29 02:59:58,760 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2250, loss[loss=0.05449, simple_loss=0.06883, pruned_loss=0.008736, audio_tagging_loss=0.01133, over 14470.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09048, pruned_loss=0.01216, audio_tagging_loss=0.00859, over 3041172.57 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:59:58,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3782493.3333333335, ans=0.2 2023-11-29 02:59:59,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3782493.3333333335, ans=0.2 2023-11-29 03:00:29,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-29 03:00:33,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=22.5 2023-11-29 03:00:45,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3782693.3333333335, ans=0.2 2023-11-29 03:00:50,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3782760.0, ans=0.025 2023-11-29 03:00:51,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3782760.0, ans=0.09899494936611666 2023-11-29 03:00:56,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3782760.0, ans=0.125 2023-11-29 03:01:00,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3782826.6666666665, ans=10.0 2023-11-29 03:01:00,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2023-11-29 03:01:00,993 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2300, loss[loss=0.06265, simple_loss=0.08582, pruned_loss=0.01047, audio_tagging_loss=0.009269, over 14459.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09029, pruned_loss=0.01207, audio_tagging_loss=0.008605, over 3050708.44 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:01:08,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3782826.6666666665, ans=0.2 2023-11-29 03:01:20,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.062e+01 9.669e+01 1.048e+02 1.317e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 03:01:30,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-29 03:01:55,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2023-11-29 03:01:58,186 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:01:58,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=22.5 2023-11-29 03:02:00,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-29 03:02:02,746 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2350, loss[loss=0.08801, simple_loss=0.1215, pruned_loss=0.01932, audio_tagging_loss=0.007953, over 15540.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0901, pruned_loss=0.012, audio_tagging_loss=0.008746, over 3051842.60 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:02:06,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3783160.0, ans=0.2 2023-11-29 03:02:32,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-29 03:03:04,485 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2400, loss[loss=0.08079, simple_loss=0.112, pruned_loss=0.0131, audio_tagging_loss=0.01171, over 15097.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09006, pruned_loss=0.01194, audio_tagging_loss=0.00885, over 3044315.85 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:03:08,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3783493.3333333335, ans=0.125 2023-11-29 03:03:18,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3783560.0, ans=0.5 2023-11-29 03:03:27,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.160e+01 9.857e+01 1.036e+02 1.512e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:03:32,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3783626.6666666665, ans=0.2 2023-11-29 03:03:34,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-29 03:03:48,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=22.5 2023-11-29 03:04:05,682 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2450, loss[loss=0.07809, simple_loss=0.1125, pruned_loss=0.01444, audio_tagging_loss=0.007417, over 15527.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09017, pruned_loss=0.01193, audio_tagging_loss=0.00884, over 3045191.89 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:04:34,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3783960.0, ans=0.2 2023-11-29 03:04:35,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-29 03:04:44,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3784026.6666666665, ans=0.125 2023-11-29 03:05:06,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2023-11-29 03:05:08,351 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2500, loss[loss=0.05541, simple_loss=0.07488, pruned_loss=0.01085, audio_tagging_loss=0.007115, over 15134.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08962, pruned_loss=0.01178, audio_tagging_loss=0.008858, over 3046319.71 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:05:30,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.821e+01 9.659e+01 1.073e+02 1.403e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 03:05:36,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3784293.3333333335, ans=0.125 2023-11-29 03:05:37,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-29 03:05:43,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3784360.0, ans=0.125 2023-11-29 03:05:54,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3784360.0, ans=0.125 2023-11-29 03:06:09,247 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2550, loss[loss=0.05067, simple_loss=0.07227, pruned_loss=0.005132, audio_tagging_loss=0.009406, over 15424.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08896, pruned_loss=0.01181, audio_tagging_loss=0.008733, over 3044465.17 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:06:10,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3784493.3333333335, ans=0.1 2023-11-29 03:06:12,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3784493.3333333335, ans=0.2 2023-11-29 03:06:22,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3784560.0, ans=0.0 2023-11-29 03:06:40,443 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-29 03:06:46,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3784693.3333333335, ans=0.125 2023-11-29 03:07:12,027 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2600, loss[loss=0.07507, simple_loss=0.09603, pruned_loss=0.01725, audio_tagging_loss=0.009806, over 14940.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08878, pruned_loss=0.01175, audio_tagging_loss=0.008642, over 3043482.59 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:07:34,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.651e+01 9.416e+01 1.044e+02 1.400e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-29 03:07:42,215 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-29 03:08:00,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3785093.3333333335, ans=0.125 2023-11-29 03:08:06,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-29 03:08:10,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3785093.3333333335, ans=0.125 2023-11-29 03:08:14,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3785160.0, ans=0.0 2023-11-29 03:08:14,953 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2650, loss[loss=0.06635, simple_loss=0.09183, pruned_loss=0.01273, audio_tagging_loss=0.007701, over 14962.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08911, pruned_loss=0.01178, audio_tagging_loss=0.008565, over 3039222.79 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:08:19,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3785160.0, ans=0.0 2023-11-29 03:08:27,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3785226.6666666665, ans=0.1 2023-11-29 03:08:43,352 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-29 03:09:15,735 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2700, loss[loss=0.05981, simple_loss=0.07793, pruned_loss=0.01264, audio_tagging_loss=0.008212, over 15450.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08867, pruned_loss=0.01179, audio_tagging_loss=0.008602, over 3046949.52 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:09:18,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3785493.3333333335, ans=0.0 2023-11-29 03:09:20,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3785493.3333333335, ans=0.125 2023-11-29 03:09:22,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3785493.3333333335, ans=0.125 2023-11-29 03:09:25,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3785493.3333333335, ans=0.125 2023-11-29 03:09:38,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.139e+01 9.804e+01 1.056e+02 1.449e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 03:09:46,633 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-29 03:09:55,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:10:17,667 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2750, loss[loss=0.05694, simple_loss=0.07916, pruned_loss=0.008637, audio_tagging_loss=0.008723, over 14425.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08871, pruned_loss=0.01175, audio_tagging_loss=0.008592, over 3050047.24 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:10:24,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3785826.6666666665, ans=0.125 2023-11-29 03:10:33,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-29 03:10:47,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-29 03:11:12,712 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:11:19,924 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2800, loss[loss=0.06105, simple_loss=0.08977, pruned_loss=0.007142, audio_tagging_loss=0.009022, over 15016.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08832, pruned_loss=0.01161, audio_tagging_loss=0.008683, over 3046993.63 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:11:20,154 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:11:25,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-29 03:11:26,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3786160.0, ans=0.125 2023-11-29 03:11:33,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3786226.6666666665, ans=0.2 2023-11-29 03:11:43,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.033e+01 9.912e+01 1.050e+02 3.585e+02, threshold=1.982e+02, percent-clipped=1.0 2023-11-29 03:11:49,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-29 03:11:56,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3786360.0, ans=0.1 2023-11-29 03:12:21,954 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2850, loss[loss=0.04375, simple_loss=0.05088, pruned_loss=0.005656, audio_tagging_loss=0.01266, over 14336.00 frames. ], tot_loss[loss=0.064, simple_loss=0.08756, pruned_loss=0.01152, audio_tagging_loss=0.008705, over 3035024.73 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:12:25,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3786493.3333333335, ans=0.125 2023-11-29 03:12:51,688 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-29 03:13:02,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3786693.3333333335, ans=0.0 2023-11-29 03:13:23,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3786760.0, ans=0.125 2023-11-29 03:13:25,905 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2900, loss[loss=0.06044, simple_loss=0.07281, pruned_loss=0.01267, audio_tagging_loss=0.01136, over 15599.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08803, pruned_loss=0.01165, audio_tagging_loss=0.00861, over 3040260.62 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:13:38,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3786893.3333333335, ans=0.0 2023-11-29 03:13:51,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 9.075e+01 9.599e+01 1.049e+02 1.799e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 03:13:56,080 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-29 03:14:17,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3787093.3333333335, ans=0.05 2023-11-29 03:14:28,242 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2950, loss[loss=0.05895, simple_loss=0.07292, pruned_loss=0.01377, audio_tagging_loss=0.008722, over 15478.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08856, pruned_loss=0.01186, audio_tagging_loss=0.008614, over 3038355.56 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:14:29,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3787160.0, ans=0.07 2023-11-29 03:14:40,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-11-29 03:14:41,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2023-11-29 03:14:46,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3787226.6666666665, ans=0.07 2023-11-29 03:14:47,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-29 03:14:52,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-29 03:14:54,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3787293.3333333335, ans=0.0 2023-11-29 03:14:57,823 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-29 03:15:11,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3787360.0, ans=0.1 2023-11-29 03:15:17,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-11-29 03:15:30,047 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3000, loss[loss=0.07315, simple_loss=0.1019, pruned_loss=0.01483, audio_tagging_loss=0.007364, over 15658.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08895, pruned_loss=0.01196, audio_tagging_loss=0.008569, over 3039264.76 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:15:30,048 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 03:16:09,294 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2973, 3.7997, 3.2673, 3.7128], device='cuda:3') 2023-11-29 03:16:11,305 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05793, simple_loss=0.05039, pruned_loss=0.005256, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-29 03:16:11,306 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 03:16:20,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3787493.3333333335, ans=0.0 2023-11-29 03:16:35,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 9.350e+01 9.749e+01 1.060e+02 2.355e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-29 03:16:41,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-29 03:16:54,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2023-11-29 03:17:13,329 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3050, loss[loss=0.0605, simple_loss=0.0878, pruned_loss=0.009653, audio_tagging_loss=0.006954, over 14143.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08885, pruned_loss=0.0119, audio_tagging_loss=0.008585, over 3036782.46 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:17:17,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3787826.6666666665, ans=0.125 2023-11-29 03:17:19,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3787826.6666666665, ans=0.0 2023-11-29 03:17:41,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3787960.0, ans=0.125 2023-11-29 03:17:42,883 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-29 03:17:48,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3787960.0, ans=0.125 2023-11-29 03:17:51,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2023-11-29 03:17:51,309 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:17:58,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3788026.6666666665, ans=0.125 2023-11-29 03:18:01,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3788026.6666666665, ans=0.0 2023-11-29 03:18:02,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3788093.3333333335, ans=0.125 2023-11-29 03:18:15,761 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3100, loss[loss=0.05061, simple_loss=0.06197, pruned_loss=0.008287, audio_tagging_loss=0.01134, over 14958.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08923, pruned_loss=0.01204, audio_tagging_loss=0.008677, over 3039780.81 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:18:33,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3788226.6666666665, ans=0.1 2023-11-29 03:18:39,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.957e+01 9.617e+01 1.028e+02 1.274e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 03:18:45,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-29 03:19:07,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3788426.6666666665, ans=0.2 2023-11-29 03:19:17,461 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3150, loss[loss=0.07622, simple_loss=0.1013, pruned_loss=0.01638, audio_tagging_loss=0.009192, over 15089.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09016, pruned_loss=0.01204, audio_tagging_loss=0.008766, over 3050281.06 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:19:17,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3788493.3333333335, ans=0.0 2023-11-29 03:19:28,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=12.0 2023-11-29 03:19:47,270 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-29 03:20:16,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3788760.0, ans=0.125 2023-11-29 03:20:18,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-29 03:20:19,171 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3200, loss[loss=0.07346, simple_loss=0.1101, pruned_loss=0.01211, audio_tagging_loss=0.006283, over 15329.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09024, pruned_loss=0.01207, audio_tagging_loss=0.008828, over 3046816.51 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:20:30,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788893.3333333335, ans=0.1 2023-11-29 03:20:35,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3788893.3333333335, ans=0.1 2023-11-29 03:20:35,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3788893.3333333335, ans=0.0 2023-11-29 03:20:44,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.999e+01 9.702e+01 1.062e+02 1.415e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:20:49,391 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-29 03:20:50,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788960.0, ans=0.1 2023-11-29 03:21:00,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=22.5 2023-11-29 03:21:01,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3789026.6666666665, ans=0.02 2023-11-29 03:21:21,239 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3250, loss[loss=0.05269, simple_loss=0.06326, pruned_loss=0.007218, audio_tagging_loss=0.01384, over 14097.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09009, pruned_loss=0.01194, audio_tagging_loss=0.00891, over 3043130.92 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:21:21,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2023-11-29 03:21:27,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3789160.0, ans=0.0 2023-11-29 03:21:30,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3789160.0, ans=0.125 2023-11-29 03:21:46,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3789293.3333333335, ans=0.0 2023-11-29 03:21:50,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-29 03:21:55,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2023-11-29 03:21:59,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3789360.0, ans=0.0 2023-11-29 03:22:04,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3789360.0, ans=0.125 2023-11-29 03:22:10,217 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:22:24,200 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3300, loss[loss=0.08487, simple_loss=0.1189, pruned_loss=0.01738, audio_tagging_loss=0.008044, over 15379.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09026, pruned_loss=0.01189, audio_tagging_loss=0.009061, over 3046067.51 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:22:48,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.011e+01 9.553e+01 1.025e+02 1.344e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 03:22:49,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3789626.6666666665, ans=0.125 2023-11-29 03:22:53,482 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-29 03:22:58,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3789626.6666666665, ans=0.125 2023-11-29 03:23:21,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3789760.0, ans=0.2 2023-11-29 03:23:25,022 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3350, loss[loss=0.07115, simple_loss=0.105, pruned_loss=0.01253, audio_tagging_loss=0.00614, over 15133.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08976, pruned_loss=0.01187, audio_tagging_loss=0.008981, over 3048137.48 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:23:34,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3789826.6666666665, ans=0.125 2023-11-29 03:23:38,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-11-29 03:23:38,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-29 03:23:40,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3789893.3333333335, ans=0.125 2023-11-29 03:23:53,717 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:23:55,268 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-29 03:24:01,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3790026.6666666665, ans=0.125 2023-11-29 03:24:17,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3790093.3333333335, ans=0.125 2023-11-29 03:24:26,777 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3400, loss[loss=0.05018, simple_loss=0.06884, pruned_loss=0.007068, audio_tagging_loss=0.008696, over 14797.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08961, pruned_loss=0.01184, audio_tagging_loss=0.00887, over 3049427.42 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:24:48,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3790226.6666666665, ans=0.2 2023-11-29 03:24:51,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.05 vs. limit=10.0 2023-11-29 03:24:51,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.195e+01 9.061e+01 9.646e+01 1.021e+02 1.209e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 03:24:56,214 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-29 03:25:04,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-29 03:25:07,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3790360.0, ans=0.1 2023-11-29 03:25:09,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3790360.0, ans=0.0 2023-11-29 03:25:28,267 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3450, loss[loss=0.06621, simple_loss=0.09328, pruned_loss=0.01299, audio_tagging_loss=0.006584, over 15250.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09008, pruned_loss=0.01204, audio_tagging_loss=0.008722, over 3050979.50 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:25:44,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3790560.0, ans=22.5 2023-11-29 03:25:57,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3790626.6666666665, ans=0.125 2023-11-29 03:25:58,441 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-29 03:26:05,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3790693.3333333335, ans=0.0 2023-11-29 03:26:10,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3790693.3333333335, ans=0.1 2023-11-29 03:26:16,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790760.0, ans=0.1 2023-11-29 03:26:30,404 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3500, loss[loss=0.07274, simple_loss=0.1042, pruned_loss=0.01193, audio_tagging_loss=0.008726, over 14880.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.0899, pruned_loss=0.01179, audio_tagging_loss=0.008631, over 3047004.35 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:26:54,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.797e+01 9.462e+01 1.015e+02 1.238e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 03:27:00,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-29 03:27:04,665 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:27:04,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3790960.0, ans=0.07 2023-11-29 03:27:19,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3791093.3333333335, ans=0.035 2023-11-29 03:27:28,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3791093.3333333335, ans=0.125 2023-11-29 03:27:32,452 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3550, loss[loss=0.04942, simple_loss=0.06561, pruned_loss=0.006068, audio_tagging_loss=0.01055, over 15133.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08918, pruned_loss=0.01167, audio_tagging_loss=0.00851, over 3045204.87 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:27:35,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2023-11-29 03:28:01,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-29 03:28:23,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3791426.6666666665, ans=0.125 2023-11-29 03:28:30,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3791426.6666666665, ans=0.125 2023-11-29 03:28:34,002 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3600, loss[loss=0.06942, simple_loss=0.09565, pruned_loss=0.01217, audio_tagging_loss=0.009428, over 15805.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.0886, pruned_loss=0.01149, audio_tagging_loss=0.008507, over 3047214.74 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:28:51,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.60 vs. limit=10.0 2023-11-29 03:28:59,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.871e+01 9.571e+01 1.037e+02 1.255e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 03:29:04,047 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-29 03:29:12,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3791693.3333333335, ans=10.0 2023-11-29 03:29:35,715 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3650, loss[loss=0.06165, simple_loss=0.08199, pruned_loss=0.01194, audio_tagging_loss=0.008708, over 15622.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08899, pruned_loss=0.01159, audio_tagging_loss=0.008411, over 3047373.52 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:29:52,631 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:30:05,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-29 03:30:19,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3792026.6666666665, ans=0.2 2023-11-29 03:30:30,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3792093.3333333335, ans=0.125 2023-11-29 03:30:37,609 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3700, loss[loss=0.05312, simple_loss=0.06487, pruned_loss=0.009877, audio_tagging_loss=0.0108, over 15649.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08887, pruned_loss=0.01155, audio_tagging_loss=0.008419, over 3053960.85 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:30:48,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3792226.6666666665, ans=0.125 2023-11-29 03:30:58,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3792226.6666666665, ans=0.1 2023-11-29 03:31:03,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 9.169e+01 9.957e+01 1.078e+02 1.355e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 03:31:07,233 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-29 03:31:24,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3792360.0, ans=0.0 2023-11-29 03:31:35,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3792426.6666666665, ans=0.0 2023-11-29 03:31:36,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3792426.6666666665, ans=0.125 2023-11-29 03:31:37,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=22.5 2023-11-29 03:31:40,467 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3750, loss[loss=0.07619, simple_loss=0.1077, pruned_loss=0.01246, audio_tagging_loss=0.009874, over 15324.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08954, pruned_loss=0.01183, audio_tagging_loss=0.00849, over 3051687.02 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:31:46,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3792493.3333333335, ans=0.125 2023-11-29 03:31:47,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3792493.3333333335, ans=0.0 2023-11-29 03:31:50,889 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:32:03,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3792560.0, ans=0.125 2023-11-29 03:32:04,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3792626.6666666665, ans=0.125 2023-11-29 03:32:11,200 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-29 03:32:16,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3792626.6666666665, ans=0.1 2023-11-29 03:32:17,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3792693.3333333335, ans=0.2 2023-11-29 03:32:23,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2023-11-29 03:32:26,200 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:32:27,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3792693.3333333335, ans=0.1 2023-11-29 03:32:42,231 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3800, loss[loss=0.06305, simple_loss=0.09027, pruned_loss=0.01089, audio_tagging_loss=0.007017, over 15508.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08985, pruned_loss=0.01187, audio_tagging_loss=0.008521, over 3057973.07 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:32:58,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3792893.3333333335, ans=0.0 2023-11-29 03:33:01,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3792893.3333333335, ans=0.125 2023-11-29 03:33:03,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3792893.3333333335, ans=0.125 2023-11-29 03:33:08,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.049e+01 9.763e+01 1.085e+02 1.488e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 03:33:11,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-29 03:33:14,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3792960.0, ans=0.125 2023-11-29 03:33:24,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-29 03:33:34,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3793093.3333333335, ans=0.0 2023-11-29 03:33:42,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3793093.3333333335, ans=0.125 2023-11-29 03:33:44,621 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3850, loss[loss=0.07658, simple_loss=0.1056, pruned_loss=0.01763, audio_tagging_loss=0.006148, over 14302.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08922, pruned_loss=0.01175, audio_tagging_loss=0.008541, over 3052392.38 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:33:55,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3793226.6666666665, ans=0.2 2023-11-29 03:34:13,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-29 03:34:16,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3793293.3333333335, ans=0.125 2023-11-29 03:34:22,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2023-11-29 03:34:45,166 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3900, loss[loss=0.07278, simple_loss=0.1025, pruned_loss=0.01537, audio_tagging_loss=0.006178, over 15154.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0891, pruned_loss=0.01176, audio_tagging_loss=0.008555, over 3047386.05 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:34:51,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3793493.3333333335, ans=0.2 2023-11-29 03:35:10,782 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 9.044e+01 9.626e+01 1.053e+02 1.477e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 03:35:12,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3793626.6666666665, ans=0.125 2023-11-29 03:35:15,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-29 03:35:16,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3793626.6666666665, ans=0.0 2023-11-29 03:35:28,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3793693.3333333335, ans=0.125 2023-11-29 03:35:42,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-11-29 03:35:46,716 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3950, loss[loss=0.06757, simple_loss=0.09884, pruned_loss=0.01112, audio_tagging_loss=0.007028, over 16750.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08979, pruned_loss=0.01184, audio_tagging_loss=0.008656, over 3049963.95 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:35:56,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3793826.6666666665, ans=0.04949747468305833 2023-11-29 03:36:15,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3793960.0, ans=0.0 2023-11-29 03:36:16,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-29 03:36:17,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3793960.0, ans=0.2 2023-11-29 03:36:41,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3794093.3333333335, ans=0.0 2023-11-29 03:36:48,491 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4000, loss[loss=0.06098, simple_loss=0.07979, pruned_loss=0.01128, audio_tagging_loss=0.009808, over 15576.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09061, pruned_loss=0.01203, audio_tagging_loss=0.008741, over 3050839.31 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:36:58,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3794160.0, ans=0.2 2023-11-29 03:37:13,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3794293.3333333335, ans=0.0 2023-11-29 03:37:14,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.940e+01 9.124e+01 9.854e+01 1.064e+02 1.398e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:37:16,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-29 03:37:18,300 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-29 03:37:43,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3794426.6666666665, ans=0.125 2023-11-29 03:37:49,709 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4050, loss[loss=0.0744, simple_loss=0.1005, pruned_loss=0.01183, audio_tagging_loss=0.01233, over 14385.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09037, pruned_loss=0.01211, audio_tagging_loss=0.008857, over 3044680.44 frames. ], batch size: 52, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:37:51,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3794493.3333333335, ans=0.1 2023-11-29 03:37:54,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3794493.3333333335, ans=0.0 2023-11-29 03:37:55,471 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:37:58,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3794493.3333333335, ans=0.125 2023-11-29 03:37:58,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3794493.3333333335, ans=0.0 2023-11-29 03:38:02,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3794560.0, ans=0.1 2023-11-29 03:38:12,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=15.0 2023-11-29 03:38:12,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3794626.6666666665, ans=0.125 2023-11-29 03:38:19,646 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-29 03:38:37,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3794693.3333333335, ans=0.1 2023-11-29 03:38:51,638 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4100, loss[loss=0.07429, simple_loss=0.1013, pruned_loss=0.01447, audio_tagging_loss=0.009188, over 16466.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09038, pruned_loss=0.01216, audio_tagging_loss=0.008754, over 3053491.08 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:38:51,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3794826.6666666665, ans=0.125 2023-11-29 03:39:03,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3794893.3333333335, ans=0.125 2023-11-29 03:39:12,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3794893.3333333335, ans=0.125 2023-11-29 03:39:13,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2023-11-29 03:39:18,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3794960.0, ans=0.0 2023-11-29 03:39:19,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 9.014e+01 9.699e+01 1.029e+02 1.254e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:39:21,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-29 03:39:38,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-11-29 03:39:49,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3795093.3333333335, ans=0.2 2023-11-29 03:39:52,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3795160.0, ans=0.0 2023-11-29 03:39:53,511 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4150, loss[loss=0.06743, simple_loss=0.08743, pruned_loss=0.01078, audio_tagging_loss=0.01294, over 14190.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09076, pruned_loss=0.01227, audio_tagging_loss=0.008668, over 3049710.76 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:39:56,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2023-11-29 03:40:05,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.04 vs. limit=10.0 2023-11-29 03:40:22,866 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-29 03:40:25,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.83 vs. limit=22.5 2023-11-29 03:40:27,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3795293.3333333335, ans=0.0 2023-11-29 03:40:41,462 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:40:48,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3795426.6666666665, ans=0.2 2023-11-29 03:40:50,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3795426.6666666665, ans=0.125 2023-11-29 03:40:54,924 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4200, loss[loss=0.06476, simple_loss=0.1008, pruned_loss=0.008478, audio_tagging_loss=0.005906, over 16109.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09052, pruned_loss=0.01214, audio_tagging_loss=0.008605, over 3050561.25 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:41:01,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3795493.3333333335, ans=15.0 2023-11-29 03:41:02,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3795493.3333333335, ans=0.5 2023-11-29 03:41:05,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2023-11-29 03:41:12,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795560.0, ans=0.1 2023-11-29 03:41:13,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3795560.0, ans=0.125 2023-11-29 03:41:19,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:21,620 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.132e+01 9.847e+01 1.051e+02 1.276e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 03:41:22,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:23,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-29 03:41:23,945 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-29 03:41:34,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3795693.3333333335, ans=0.0 2023-11-29 03:41:35,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795693.3333333335, ans=0.1 2023-11-29 03:41:35,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-11-29 03:41:56,095 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4250, loss[loss=0.07973, simple_loss=0.114, pruned_loss=0.01613, audio_tagging_loss=0.006591, over 15359.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09097, pruned_loss=0.01214, audio_tagging_loss=0.008471, over 3051255.41 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:42:01,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3795826.6666666665, ans=0.0 2023-11-29 03:42:04,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3795826.6666666665, ans=0.2 2023-11-29 03:42:05,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3795826.6666666665, ans=0.0 2023-11-29 03:42:24,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-29 03:42:29,594 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:42:40,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3796026.6666666665, ans=0.0 2023-11-29 03:42:54,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3796093.3333333335, ans=0.125 2023-11-29 03:42:55,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3796093.3333333335, ans=0.125 2023-11-29 03:42:56,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3796160.0, ans=0.0 2023-11-29 03:42:57,149 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4300, loss[loss=0.06102, simple_loss=0.08848, pruned_loss=0.008344, audio_tagging_loss=0.008435, over 15589.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09171, pruned_loss=0.0123, audio_tagging_loss=0.00837, over 3056616.97 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:42:58,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=3796160.0, ans=12.0 2023-11-29 03:43:12,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-11-29 03:43:13,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3796226.6666666665, ans=0.0 2023-11-29 03:43:24,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 9.187e+01 9.912e+01 1.060e+02 1.366e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 03:43:27,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-29 03:43:29,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2023-11-29 03:43:30,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3796293.3333333335, ans=0.025 2023-11-29 03:43:58,127 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4350, loss[loss=0.08294, simple_loss=0.1206, pruned_loss=0.01502, audio_tagging_loss=0.007639, over 15405.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.091, pruned_loss=0.01213, audio_tagging_loss=0.008409, over 3052670.83 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:44:19,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-11-29 03:44:22,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-29 03:44:27,868 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-29 03:44:32,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-29 03:44:33,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3796626.6666666665, ans=0.05 2023-11-29 03:44:39,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3796693.3333333335, ans=0.5 2023-11-29 03:44:42,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3796693.3333333335, ans=0.125 2023-11-29 03:44:43,936 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:44:46,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-29 03:44:53,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3796760.0, ans=0.0 2023-11-29 03:45:00,052 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4400, loss[loss=0.08124, simple_loss=0.1228, pruned_loss=0.01537, audio_tagging_loss=0.004456, over 16371.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09008, pruned_loss=0.01197, audio_tagging_loss=0.008439, over 3055072.10 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:45:05,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-29 03:45:06,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=22.5 2023-11-29 03:45:21,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-11-29 03:45:24,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3796960.0, ans=0.125 2023-11-29 03:45:26,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.800e+01 8.869e+01 9.573e+01 1.013e+02 1.408e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 03:45:28,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-29 03:45:35,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3797026.6666666665, ans=0.0 2023-11-29 03:46:00,723 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4450, loss[loss=0.09586, simple_loss=0.1358, pruned_loss=0.02121, audio_tagging_loss=0.006746, over 16232.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08986, pruned_loss=0.01208, audio_tagging_loss=0.008463, over 3050131.88 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:46:13,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3797226.6666666665, ans=0.125 2023-11-29 03:46:30,079 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-29 03:46:49,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-11-29 03:46:49,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3797426.6666666665, ans=0.125 2023-11-29 03:46:57,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3797426.6666666665, ans=0.0 2023-11-29 03:47:01,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2023-11-29 03:47:02,118 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4500, loss[loss=0.08144, simple_loss=0.1114, pruned_loss=0.01532, audio_tagging_loss=0.01041, over 15691.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09017, pruned_loss=0.01213, audio_tagging_loss=0.008366, over 3061567.37 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:47:10,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3797493.3333333335, ans=0.0 2023-11-29 03:47:29,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.934e+01 9.508e+01 1.012e+02 1.257e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 03:47:31,522 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-29 03:47:36,434 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:48:00,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-29 03:48:02,597 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4550, loss[loss=0.06562, simple_loss=0.09055, pruned_loss=0.01185, audio_tagging_loss=0.008486, over 15745.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08939, pruned_loss=0.01205, audio_tagging_loss=0.008433, over 3058960.30 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:48:02,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3797826.6666666665, ans=0.0 2023-11-29 03:48:02,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3797826.6666666665, ans=0.2 2023-11-29 03:48:25,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3797893.3333333335, ans=0.125 2023-11-29 03:48:32,892 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-29 03:48:35,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3797960.0, ans=0.1 2023-11-29 03:48:36,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3797960.0, ans=0.125 2023-11-29 03:48:53,600 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:49:04,178 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4600, loss[loss=0.05853, simple_loss=0.07603, pruned_loss=0.01098, audio_tagging_loss=0.00953, over 14143.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08868, pruned_loss=0.01192, audio_tagging_loss=0.008539, over 3054268.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:49:19,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3798226.6666666665, ans=0.125 2023-11-29 03:49:20,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3798226.6666666665, ans=0.1 2023-11-29 03:49:30,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.831e+01 9.354e+01 1.006e+02 1.240e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-29 03:49:33,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-29 03:49:46,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-29 03:49:51,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3798360.0, ans=0.0 2023-11-29 03:49:58,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.44 vs. limit=10.0 2023-11-29 03:50:05,622 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4650, loss[loss=0.06051, simple_loss=0.08239, pruned_loss=0.01036, audio_tagging_loss=0.008957, over 15040.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08831, pruned_loss=0.0119, audio_tagging_loss=0.008684, over 3055161.58 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:50:17,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3798560.0, ans=0.1 2023-11-29 03:50:23,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3798560.0, ans=0.0 2023-11-29 03:50:34,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-29 03:50:38,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3798626.6666666665, ans=0.2 2023-11-29 03:50:58,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3798760.0, ans=0.125 2023-11-29 03:51:06,104 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4700, loss[loss=0.06171, simple_loss=0.08396, pruned_loss=0.01025, audio_tagging_loss=0.009483, over 16190.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08748, pruned_loss=0.01185, audio_tagging_loss=0.008686, over 3055088.56 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:51:08,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-29 03:51:08,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 2023-11-29 03:51:12,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3798826.6666666665, ans=0.1 2023-11-29 03:51:12,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3798826.6666666665, ans=10.0 2023-11-29 03:51:33,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.105e+01 9.941e+01 1.052e+02 1.267e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 03:51:36,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-29 03:51:37,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3798960.0, ans=0.1 2023-11-29 03:51:39,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3798960.0, ans=0.125 2023-11-29 03:51:39,876 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:51:42,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2023-11-29 03:52:04,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3799093.3333333335, ans=0.2 2023-11-29 03:52:08,216 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4750, loss[loss=0.09155, simple_loss=0.1154, pruned_loss=0.02613, audio_tagging_loss=0.007725, over 15544.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0888, pruned_loss=0.01217, audio_tagging_loss=0.008724, over 3061496.85 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:52:14,144 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:52:20,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3799226.6666666665, ans=0.1 2023-11-29 03:52:23,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3799226.6666666665, ans=0.125 2023-11-29 03:52:30,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3799293.3333333335, ans=0.0 2023-11-29 03:52:36,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-29 03:52:39,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3799293.3333333335, ans=0.125 2023-11-29 03:52:40,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3799293.3333333335, ans=0.125 2023-11-29 03:52:48,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3799360.0, ans=0.2 2023-11-29 03:53:09,910 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4800, loss[loss=0.06447, simple_loss=0.09378, pruned_loss=0.01144, audio_tagging_loss=0.006134, over 15671.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08863, pruned_loss=0.01203, audio_tagging_loss=0.008777, over 3053772.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:53:14,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3799493.3333333335, ans=0.125 2023-11-29 03:53:21,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-29 03:53:37,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.976e+01 9.656e+01 1.035e+02 1.213e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 03:53:38,611 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-29 03:53:51,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3799693.3333333335, ans=0.125 2023-11-29 03:54:09,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3799760.0, ans=0.0 2023-11-29 03:54:11,291 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4850, loss[loss=0.05458, simple_loss=0.07592, pruned_loss=0.007799, audio_tagging_loss=0.008823, over 16154.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08868, pruned_loss=0.012, audio_tagging_loss=0.008889, over 3052551.44 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:54:42,044 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-29 03:54:51,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3800026.6666666665, ans=0.125 2023-11-29 03:54:56,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3800026.6666666665, ans=0.0 2023-11-29 03:55:04,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3800093.3333333335, ans=0.125 2023-11-29 03:55:13,236 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4900, loss[loss=0.08777, simple_loss=0.1134, pruned_loss=0.02275, audio_tagging_loss=0.008304, over 16005.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08931, pruned_loss=0.01215, audio_tagging_loss=0.008895, over 3051167.54 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:55:43,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.986e+01 9.622e+01 1.028e+02 1.398e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 03:55:44,466 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-29 03:55:50,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3800360.0, ans=0.125 2023-11-29 03:55:54,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3800360.0, ans=0.125 2023-11-29 03:56:16,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3800426.6666666665, ans=0.05 2023-11-29 03:56:18,009 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4950, loss[loss=0.05828, simple_loss=0.09051, pruned_loss=0.00556, audio_tagging_loss=0.00747, over 15576.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08954, pruned_loss=0.01199, audio_tagging_loss=0.008629, over 3046026.77 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:56:47,092 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-29 03:56:48,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3800626.6666666665, ans=0.125 2023-11-29 03:57:03,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3800693.3333333335, ans=0.1 2023-11-29 03:57:10,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3800760.0, ans=0.2 2023-11-29 03:57:18,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3800826.6666666665, ans=0.0 2023-11-29 03:57:19,422 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5000, loss[loss=0.06061, simple_loss=0.08904, pruned_loss=0.009502, audio_tagging_loss=0.006584, over 15304.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08953, pruned_loss=0.0119, audio_tagging_loss=0.008542, over 3048640.87 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:57:30,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3800893.3333333335, ans=0.125 2023-11-29 03:57:33,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3800893.3333333335, ans=0.125 2023-11-29 03:57:35,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3800893.3333333335, ans=0.125 2023-11-29 03:57:38,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.75 vs. limit=10.0 2023-11-29 03:57:40,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2023-11-29 03:57:41,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-29 03:57:48,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.990e+01 9.495e+01 1.030e+02 1.330e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 03:57:49,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-29 03:57:51,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3800960.0, ans=0.125 2023-11-29 03:58:04,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3801026.6666666665, ans=0.125 2023-11-29 03:58:21,212 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5050, loss[loss=0.05681, simple_loss=0.0766, pruned_loss=0.008986, audio_tagging_loss=0.009518, over 14943.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08859, pruned_loss=0.01178, audio_tagging_loss=0.008438, over 3039798.00 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:58:27,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3801160.0, ans=0.125 2023-11-29 03:58:37,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-29 03:58:38,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3801226.6666666665, ans=0.02 2023-11-29 03:58:38,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3801226.6666666665, ans=10.0 2023-11-29 03:58:43,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3801226.6666666665, ans=0.125 2023-11-29 03:58:50,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-29 03:58:50,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3801293.3333333335, ans=0.07 2023-11-29 03:59:06,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3801360.0, ans=0.125 2023-11-29 03:59:22,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3801493.3333333335, ans=0.07 2023-11-29 03:59:22,889 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5100, loss[loss=0.06606, simple_loss=0.08877, pruned_loss=0.01311, audio_tagging_loss=0.008564, over 15605.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08887, pruned_loss=0.01175, audio_tagging_loss=0.008445, over 3046610.03 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:59:23,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801493.3333333335, ans=0.1 2023-11-29 03:59:26,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3801493.3333333335, ans=0.125 2023-11-29 03:59:50,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 9.192e+01 9.667e+01 1.067e+02 2.138e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-29 03:59:51,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-29 03:59:54,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3801626.6666666665, ans=0.125 2023-11-29 03:59:55,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-29 03:59:58,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801693.3333333335, ans=0.125 2023-11-29 04:00:05,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3801693.3333333335, ans=0.2 2023-11-29 04:00:14,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-29 04:00:16,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801760.0, ans=0.125 2023-11-29 04:00:17,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3801760.0, ans=0.125 2023-11-29 04:00:24,129 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5150, loss[loss=0.0547, simple_loss=0.07513, pruned_loss=0.009423, audio_tagging_loss=0.007713, over 14471.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08862, pruned_loss=0.01166, audio_tagging_loss=0.00847, over 3048188.62 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:00:30,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-29 04:00:51,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3801960.0, ans=0.125 2023-11-29 04:00:53,283 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-29 04:01:25,315 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5200, loss[loss=0.05465, simple_loss=0.07215, pruned_loss=0.01139, audio_tagging_loss=0.007182, over 14394.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08879, pruned_loss=0.01181, audio_tagging_loss=0.008481, over 3044569.69 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:01:31,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3802160.0, ans=0.2 2023-11-29 04:01:43,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3802226.6666666665, ans=0.125 2023-11-29 04:01:54,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.016e+01 9.699e+01 1.038e+02 1.418e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 04:01:55,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-29 04:02:18,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3802426.6666666665, ans=0.1 2023-11-29 04:02:26,884 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5250, loss[loss=0.06843, simple_loss=0.09151, pruned_loss=0.01047, audio_tagging_loss=0.0122, over 15313.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.09008, pruned_loss=0.01194, audio_tagging_loss=0.008381, over 3051954.77 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:02:34,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-29 04:02:41,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2023-11-29 04:02:56,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-29 04:03:11,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3802693.3333333335, ans=0.125 2023-11-29 04:03:17,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3802760.0, ans=0.05 2023-11-29 04:03:27,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-29 04:03:28,964 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5300, loss[loss=0.07434, simple_loss=0.09876, pruned_loss=0.01599, audio_tagging_loss=0.008976, over 14742.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09071, pruned_loss=0.01206, audio_tagging_loss=0.008256, over 3047352.02 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:03:57,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 9.143e+01 9.635e+01 1.038e+02 1.334e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 04:03:57,846 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-29 04:04:00,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3802960.0, ans=0.04949747468305833 2023-11-29 04:04:02,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3802960.0, ans=0.125 2023-11-29 04:04:21,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3803093.3333333335, ans=0.1 2023-11-29 04:04:29,653 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5350, loss[loss=0.06422, simple_loss=0.08871, pruned_loss=0.01231, audio_tagging_loss=0.00755, over 16087.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09038, pruned_loss=0.01215, audio_tagging_loss=0.008369, over 3041951.64 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:04:43,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3803226.6666666665, ans=0.125 2023-11-29 04:04:55,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-29 04:05:00,560 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-29 04:05:00,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3803293.3333333335, ans=0.05 2023-11-29 04:05:13,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3803360.0, ans=0.025 2023-11-29 04:05:16,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3803360.0, ans=0.125 2023-11-29 04:05:19,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3803426.6666666665, ans=0.125 2023-11-29 04:05:31,596 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5400, loss[loss=0.0435, simple_loss=0.06165, pruned_loss=0.005089, audio_tagging_loss=0.007588, over 14515.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09181, pruned_loss=0.01225, audio_tagging_loss=0.008316, over 3036875.00 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:05:32,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3803493.3333333335, ans=0.125 2023-11-29 04:05:38,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3803493.3333333335, ans=0.0 2023-11-29 04:05:52,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-11-29 04:06:01,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.108e+01 9.705e+01 1.034e+02 1.334e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 04:06:01,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-29 04:06:14,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3803693.3333333335, ans=0.2 2023-11-29 04:06:33,398 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5450, loss[loss=0.06948, simple_loss=0.08825, pruned_loss=0.01376, audio_tagging_loss=0.01159, over 14499.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09041, pruned_loss=0.01213, audio_tagging_loss=0.008392, over 3038583.14 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:06:48,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3803893.3333333335, ans=0.125 2023-11-29 04:07:03,391 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-29 04:07:35,612 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5500, loss[loss=0.07076, simple_loss=0.1074, pruned_loss=0.01079, audio_tagging_loss=0.006283, over 14467.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09003, pruned_loss=0.01202, audio_tagging_loss=0.008503, over 3041014.81 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:07:38,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2023-11-29 04:08:03,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2023-11-29 04:08:05,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-29 04:08:06,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.091e+01 9.676e+01 1.052e+02 2.081e+02, threshold=1.935e+02, percent-clipped=1.0 2023-11-29 04:08:32,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=12.0 2023-11-29 04:08:37,437 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5550, loss[loss=0.05937, simple_loss=0.08493, pruned_loss=0.00916, audio_tagging_loss=0.007743, over 15030.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09017, pruned_loss=0.01203, audio_tagging_loss=0.008539, over 3039949.49 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 04:08:37,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3804493.3333333335, ans=0.125 2023-11-29 04:08:53,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3804560.0, ans=0.125 2023-11-29 04:09:00,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3804626.6666666665, ans=0.1 2023-11-29 04:09:07,005 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-29 04:09:24,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3804693.3333333335, ans=0.0 2023-11-29 04:09:26,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3804760.0, ans=0.125 2023-11-29 04:09:35,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3804760.0, ans=0.1 2023-11-29 04:09:39,322 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5600, loss[loss=0.06355, simple_loss=0.08804, pruned_loss=0.01102, audio_tagging_loss=0.008512, over 15596.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09003, pruned_loss=0.01191, audio_tagging_loss=0.008708, over 3048697.83 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:10:03,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3804960.0, ans=0.125 2023-11-29 04:10:08,865 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-29 04:10:09,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.028e+01 9.748e+01 1.040e+02 1.265e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:10:24,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3805026.6666666665, ans=0.125 2023-11-29 04:10:26,119 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:10:38,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-29 04:10:40,924 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5650, loss[loss=0.06587, simple_loss=0.09335, pruned_loss=0.01118, audio_tagging_loss=0.008014, over 14853.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08959, pruned_loss=0.01185, audio_tagging_loss=0.008769, over 3055986.65 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:11:10,731 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-29 04:11:33,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3805426.6666666665, ans=0.125 2023-11-29 04:11:33,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2023-11-29 04:11:42,456 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5700, loss[loss=0.0739, simple_loss=0.09746, pruned_loss=0.01588, audio_tagging_loss=0.009289, over 15415.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08967, pruned_loss=0.01207, audio_tagging_loss=0.008747, over 3047363.14 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:11:50,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3805493.3333333335, ans=0.0 2023-11-29 04:11:58,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3805560.0, ans=0.125 2023-11-29 04:11:59,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3805560.0, ans=0.2 2023-11-29 04:12:11,931 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-29 04:12:13,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.102e+01 9.721e+01 1.096e+02 1.374e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 04:12:27,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=22.5 2023-11-29 04:12:28,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3805693.3333333335, ans=0.0 2023-11-29 04:12:33,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3805760.0, ans=0.125 2023-11-29 04:12:38,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3805760.0, ans=0.015 2023-11-29 04:12:43,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3805826.6666666665, ans=0.125 2023-11-29 04:12:44,429 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5750, loss[loss=0.05282, simple_loss=0.07561, pruned_loss=0.008855, audio_tagging_loss=0.006161, over 14479.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08926, pruned_loss=0.01204, audio_tagging_loss=0.008586, over 3051042.45 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:12:58,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3805893.3333333335, ans=0.125 2023-11-29 04:13:13,036 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-29 04:13:13,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3805960.0, ans=0.04949747468305833 2023-11-29 04:13:34,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3806093.3333333335, ans=0.125 2023-11-29 04:13:44,343 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5800, loss[loss=0.07373, simple_loss=0.1027, pruned_loss=0.01616, audio_tagging_loss=0.006219, over 14719.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08994, pruned_loss=0.01216, audio_tagging_loss=0.0084, over 3042163.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:14:11,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2023-11-29 04:14:14,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-29 04:14:14,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-29 04:14:15,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.950e+01 9.520e+01 1.017e+02 1.550e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 04:14:22,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806360.0, ans=0.1 2023-11-29 04:14:23,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-29 04:14:25,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3806360.0, ans=0.0 2023-11-29 04:14:25,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3806360.0, ans=0.2 2023-11-29 04:14:33,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:46,532 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5850, loss[loss=0.05084, simple_loss=0.07159, pruned_loss=0.006472, audio_tagging_loss=0.00857, over 15750.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08947, pruned_loss=0.01212, audio_tagging_loss=0.00845, over 3038946.40 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:14:51,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806493.3333333335, ans=0.1 2023-11-29 04:14:55,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3806493.3333333335, ans=0.125 2023-11-29 04:15:01,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3806560.0, ans=0.0 2023-11-29 04:15:11,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3806626.6666666665, ans=0.0 2023-11-29 04:15:15,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-29 04:15:42,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3806760.0, ans=0.1 2023-11-29 04:15:49,139 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5900, loss[loss=0.06851, simple_loss=0.09555, pruned_loss=0.0116, audio_tagging_loss=0.009137, over 14885.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09008, pruned_loss=0.01227, audio_tagging_loss=0.008453, over 3042245.36 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:15:54,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806826.6666666665, ans=0.1 2023-11-29 04:15:56,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3806826.6666666665, ans=0.1 2023-11-29 04:16:10,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3806893.3333333335, ans=0.2 2023-11-29 04:16:17,703 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-29 04:16:17,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3806960.0, ans=0.125 2023-11-29 04:16:18,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 9.359e+01 9.876e+01 1.067e+02 1.252e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 04:16:30,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3807026.6666666665, ans=0.125 2023-11-29 04:16:47,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3807093.3333333335, ans=0.2 2023-11-29 04:16:50,127 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5950, loss[loss=0.05361, simple_loss=0.06292, pruned_loss=0.006171, audio_tagging_loss=0.01597, over 13681.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09001, pruned_loss=0.01219, audio_tagging_loss=0.008425, over 3042533.59 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:16:51,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=22.5 2023-11-29 04:17:19,908 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-29 04:17:25,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3807293.3333333335, ans=0.125 2023-11-29 04:17:30,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3807360.0, ans=0.0 2023-11-29 04:17:42,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3807426.6666666665, ans=0.0 2023-11-29 04:17:51,338 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6000, loss[loss=0.08117, simple_loss=0.1155, pruned_loss=0.01621, audio_tagging_loss=0.007203, over 15589.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0899, pruned_loss=0.01204, audio_tagging_loss=0.00844, over 3041918.30 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:17:51,339 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 04:18:22,436 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6137, 3.6310, 3.9688, 3.3949], device='cuda:3') 2023-11-29 04:18:31,355 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05827, simple_loss=0.05042, pruned_loss=0.005313, audio_tagging_loss=0.02774, over 4681554.00 frames. 2023-11-29 04:18:31,356 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 04:18:37,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3807493.3333333335, ans=0.1 2023-11-29 04:19:00,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-29 04:19:00,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3807626.6666666665, ans=0.0 2023-11-29 04:19:01,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.999e+01 9.693e+01 1.031e+02 2.165e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-29 04:19:05,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3807626.6666666665, ans=0.125 2023-11-29 04:19:09,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3807693.3333333335, ans=0.0 2023-11-29 04:19:19,603 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:19:32,549 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6050, loss[loss=0.04993, simple_loss=0.06321, pruned_loss=0.008402, audio_tagging_loss=0.00992, over 14279.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08995, pruned_loss=0.01205, audio_tagging_loss=0.008442, over 3048511.76 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:19:42,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-29 04:19:51,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3807893.3333333335, ans=0.0 2023-11-29 04:20:02,428 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-29 04:20:16,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3808026.6666666665, ans=0.125 2023-11-29 04:20:16,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3808026.6666666665, ans=0.0 2023-11-29 04:20:19,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3808026.6666666665, ans=0.1 2023-11-29 04:20:30,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3808093.3333333335, ans=0.0 2023-11-29 04:20:34,995 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6100, loss[loss=0.05427, simple_loss=0.0765, pruned_loss=0.008631, audio_tagging_loss=0.00739, over 16107.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08975, pruned_loss=0.01204, audio_tagging_loss=0.008373, over 3059909.75 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:20:35,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3808160.0, ans=0.125 2023-11-29 04:20:45,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3808160.0, ans=0.1 2023-11-29 04:20:46,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-11-29 04:20:49,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3808226.6666666665, ans=0.125 2023-11-29 04:21:05,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-29 04:21:07,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.969e+01 9.609e+01 1.049e+02 1.338e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 04:21:18,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3808360.0, ans=0.0 2023-11-29 04:21:37,886 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6150, loss[loss=0.06711, simple_loss=0.08702, pruned_loss=0.01522, audio_tagging_loss=0.008379, over 15359.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09011, pruned_loss=0.01223, audio_tagging_loss=0.008354, over 3056194.53 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:21:40,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-29 04:21:49,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2023-11-29 04:22:05,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3808626.6666666665, ans=0.0 2023-11-29 04:22:07,177 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-29 04:22:11,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3808626.6666666665, ans=0.0 2023-11-29 04:22:14,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-29 04:22:38,911 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6200, loss[loss=0.06814, simple_loss=0.09407, pruned_loss=0.01076, audio_tagging_loss=0.01035, over 15334.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09015, pruned_loss=0.01222, audio_tagging_loss=0.008506, over 3053293.83 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:22:42,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3808826.6666666665, ans=0.0 2023-11-29 04:22:42,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3808826.6666666665, ans=0.1 2023-11-29 04:23:08,407 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-29 04:23:10,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.947e+01 9.565e+01 1.046e+02 1.413e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:23:19,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-29 04:23:32,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3809093.3333333335, ans=0.125 2023-11-29 04:23:36,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-29 04:23:40,236 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6250, loss[loss=0.06817, simple_loss=0.08411, pruned_loss=0.01637, audio_tagging_loss=0.009739, over 14398.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09028, pruned_loss=0.01223, audio_tagging_loss=0.008592, over 3051226.08 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:24:10,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-29 04:24:20,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3809360.0, ans=0.025 2023-11-29 04:24:41,932 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6300, loss[loss=0.06033, simple_loss=0.08386, pruned_loss=0.009915, audio_tagging_loss=0.00848, over 15469.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09042, pruned_loss=0.01222, audio_tagging_loss=0.008675, over 3056206.74 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:25:11,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-29 04:25:13,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.159e+01 9.734e+01 1.043e+02 1.366e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 04:25:20,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2023-11-29 04:25:43,844 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6350, loss[loss=0.08127, simple_loss=0.1186, pruned_loss=0.01299, audio_tagging_loss=0.009001, over 14525.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09123, pruned_loss=0.01226, audio_tagging_loss=0.008696, over 3050525.94 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:26:12,662 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-29 04:26:12,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3809960.0, ans=0.125 2023-11-29 04:26:15,677 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:26:15,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3809960.0, ans=0.0 2023-11-29 04:26:18,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3809960.0, ans=0.0 2023-11-29 04:26:19,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3810026.6666666665, ans=0.125 2023-11-29 04:26:27,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3810026.6666666665, ans=0.0 2023-11-29 04:26:45,494 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6400, loss[loss=0.06875, simple_loss=0.08854, pruned_loss=0.01367, audio_tagging_loss=0.01081, over 14552.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08967, pruned_loss=0.01203, audio_tagging_loss=0.008791, over 3044926.59 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:27:00,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3810226.6666666665, ans=0.0 2023-11-29 04:27:06,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3810226.6666666665, ans=0.125 2023-11-29 04:27:06,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3810226.6666666665, ans=0.125 2023-11-29 04:27:15,270 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-29 04:27:17,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.796e+01 9.535e+01 1.038e+02 1.501e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 04:27:21,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3810360.0, ans=0.5 2023-11-29 04:27:45,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3810493.3333333335, ans=0.015 2023-11-29 04:27:46,656 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6450, loss[loss=0.05243, simple_loss=0.07005, pruned_loss=0.009729, audio_tagging_loss=0.007681, over 14892.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08908, pruned_loss=0.01194, audio_tagging_loss=0.008835, over 3041281.36 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:27:52,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3810493.3333333335, ans=0.0 2023-11-29 04:27:54,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.75 vs. limit=6.0 2023-11-29 04:28:16,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-29 04:28:48,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-29 04:28:49,418 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6500, loss[loss=0.06293, simple_loss=0.0841, pruned_loss=0.01267, audio_tagging_loss=0.008212, over 14872.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08918, pruned_loss=0.01193, audio_tagging_loss=0.008767, over 3044779.73 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:28:53,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-29 04:29:02,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2023-11-29 04:29:05,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2023-11-29 04:29:18,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-29 04:29:20,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.207e+01 9.940e+01 1.055e+02 1.312e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 04:29:24,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3811026.6666666665, ans=0.0 2023-11-29 04:29:32,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-29 04:29:50,417 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6550, loss[loss=0.05341, simple_loss=0.06542, pruned_loss=0.008968, audio_tagging_loss=0.01173, over 14329.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0896, pruned_loss=0.01217, audio_tagging_loss=0.008641, over 3046898.98 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:29:58,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3811160.0, ans=0.125 2023-11-29 04:30:00,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3811160.0, ans=0.2 2023-11-29 04:30:03,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-29 04:30:19,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3811293.3333333335, ans=0.125 2023-11-29 04:30:19,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3811293.3333333335, ans=0.2 2023-11-29 04:30:20,575 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-29 04:30:27,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3811360.0, ans=0.125 2023-11-29 04:30:43,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2023-11-29 04:30:46,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3811426.6666666665, ans=0.2 2023-11-29 04:30:47,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2023-11-29 04:30:52,152 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6600, loss[loss=0.05039, simple_loss=0.06303, pruned_loss=0.008912, audio_tagging_loss=0.009966, over 14620.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08964, pruned_loss=0.01217, audio_tagging_loss=0.008479, over 3043511.39 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:30:56,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=22.5 2023-11-29 04:31:00,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-11-29 04:31:10,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3811560.0, ans=0.125 2023-11-29 04:31:21,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3811626.6666666665, ans=0.125 2023-11-29 04:31:22,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-29 04:31:24,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.047e+01 9.716e+01 1.044e+02 1.337e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 04:31:28,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3811693.3333333335, ans=0.05 2023-11-29 04:31:54,296 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6650, loss[loss=0.07018, simple_loss=0.09918, pruned_loss=0.01231, audio_tagging_loss=0.008282, over 15487.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08876, pruned_loss=0.0119, audio_tagging_loss=0.008577, over 3045792.70 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:32:05,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3811893.3333333335, ans=0.125 2023-11-29 04:32:08,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3811893.3333333335, ans=0.125 2023-11-29 04:32:13,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-29 04:32:14,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3811893.3333333335, ans=0.125 2023-11-29 04:32:18,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3811960.0, ans=0.0 2023-11-29 04:32:22,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3811960.0, ans=22.5 2023-11-29 04:32:23,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-29 04:32:52,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3812093.3333333335, ans=0.125 2023-11-29 04:32:56,022 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6700, loss[loss=0.07038, simple_loss=0.09522, pruned_loss=0.01654, audio_tagging_loss=0.006228, over 15342.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08916, pruned_loss=0.01205, audio_tagging_loss=0.008456, over 3047552.94 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:33:15,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3812226.6666666665, ans=0.125 2023-11-29 04:33:25,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-29 04:33:29,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.065e+01 9.575e+01 1.004e+02 1.192e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 04:33:44,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3812426.6666666665, ans=0.5 2023-11-29 04:33:55,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3812426.6666666665, ans=0.0 2023-11-29 04:33:57,355 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6750, loss[loss=0.07006, simple_loss=0.09712, pruned_loss=0.01331, audio_tagging_loss=0.008183, over 14598.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08943, pruned_loss=0.01209, audio_tagging_loss=0.008477, over 3041044.69 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:34:26,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-29 04:34:44,685 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:34:54,663 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:34:54,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3812760.0, ans=0.125 2023-11-29 04:34:59,694 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6800, loss[loss=0.06303, simple_loss=0.09303, pruned_loss=0.01038, audio_tagging_loss=0.006138, over 15337.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08945, pruned_loss=0.01218, audio_tagging_loss=0.008403, over 3037513.70 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:35:00,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3812826.6666666665, ans=0.125 2023-11-29 04:35:28,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3812960.0, ans=0.1 2023-11-29 04:35:29,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-29 04:35:32,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.971e+01 9.540e+01 1.002e+02 2.888e+02, threshold=1.908e+02, percent-clipped=1.0 2023-11-29 04:36:00,795 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6850, loss[loss=0.0678, simple_loss=0.0832, pruned_loss=0.01656, audio_tagging_loss=0.009646, over 14551.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08966, pruned_loss=0.01218, audio_tagging_loss=0.008429, over 3047024.45 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:36:27,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3813293.3333333335, ans=0.125 2023-11-29 04:36:30,965 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-29 04:36:43,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=15.0 2023-11-29 04:36:45,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3813360.0, ans=0.125 2023-11-29 04:37:00,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2023-11-29 04:37:05,102 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6900, loss[loss=0.06114, simple_loss=0.0795, pruned_loss=0.01235, audio_tagging_loss=0.009043, over 14748.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08951, pruned_loss=0.01203, audio_tagging_loss=0.008459, over 3048675.37 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:37:06,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3813493.3333333335, ans=0.0 2023-11-29 04:37:12,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3813493.3333333335, ans=0.125 2023-11-29 04:37:34,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-29 04:37:38,059 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.005e+01 9.691e+01 1.035e+02 1.354e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 04:37:55,713 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:38:06,648 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6950, loss[loss=0.06132, simple_loss=0.08717, pruned_loss=0.009037, audio_tagging_loss=0.008702, over 15034.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08954, pruned_loss=0.01199, audio_tagging_loss=0.008461, over 3047233.11 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:38:27,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3813893.3333333335, ans=0.125 2023-11-29 04:38:36,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-29 04:38:40,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3813960.0, ans=0.125 2023-11-29 04:38:44,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2023-11-29 04:38:45,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3814026.6666666665, ans=0.09899494936611666 2023-11-29 04:38:46,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3814026.6666666665, ans=0.125 2023-11-29 04:39:07,952 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7000, loss[loss=0.07049, simple_loss=0.09525, pruned_loss=0.01545, audio_tagging_loss=0.007415, over 15117.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08942, pruned_loss=0.01198, audio_tagging_loss=0.008478, over 3040910.95 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:39:10,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3814160.0, ans=0.125 2023-11-29 04:39:29,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3814226.6666666665, ans=0.125 2023-11-29 04:39:37,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-11-29 04:39:38,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-29 04:39:43,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.947e+01 9.387e+01 1.017e+02 2.856e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-29 04:39:51,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3814360.0, ans=0.0 2023-11-29 04:39:55,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3814360.0, ans=0.125 2023-11-29 04:40:07,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3814426.6666666665, ans=0.125 2023-11-29 04:40:09,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-29 04:40:10,536 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7050, loss[loss=0.05846, simple_loss=0.08029, pruned_loss=0.01113, audio_tagging_loss=0.007195, over 15384.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08964, pruned_loss=0.01196, audio_tagging_loss=0.008557, over 3042898.40 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:40:38,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3814626.6666666665, ans=0.125 2023-11-29 04:40:39,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-29 04:40:58,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3814760.0, ans=0.5 2023-11-29 04:41:12,059 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7100, loss[loss=0.06928, simple_loss=0.1165, pruned_loss=0.005649, audio_tagging_loss=0.005371, over 13999.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08936, pruned_loss=0.01184, audio_tagging_loss=0.008739, over 3049863.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:41:13,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3814826.6666666665, ans=0.0 2023-11-29 04:41:15,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3814826.6666666665, ans=0.125 2023-11-29 04:41:19,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3814826.6666666665, ans=0.125 2023-11-29 04:41:26,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3814893.3333333335, ans=0.0 2023-11-29 04:41:40,529 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-29 04:41:42,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3814960.0, ans=0.0 2023-11-29 04:41:47,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.957e+01 9.566e+01 1.017e+02 1.804e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:41:58,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3815026.6666666665, ans=0.1 2023-11-29 04:42:12,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3815160.0, ans=0.125 2023-11-29 04:42:13,128 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7150, loss[loss=0.0561, simple_loss=0.06584, pruned_loss=0.01188, audio_tagging_loss=0.0113, over 15195.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08999, pruned_loss=0.01204, audio_tagging_loss=0.008713, over 3064217.18 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:42:42,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-29 04:42:46,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3815293.3333333335, ans=0.07 2023-11-29 04:42:56,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3815360.0, ans=0.125 2023-11-29 04:43:13,867 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7200, loss[loss=0.05557, simple_loss=0.06934, pruned_loss=0.0107, audio_tagging_loss=0.01021, over 15249.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08949, pruned_loss=0.01199, audio_tagging_loss=0.00886, over 3054194.48 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:43:40,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3815626.6666666665, ans=0.5 2023-11-29 04:43:43,314 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:43:44,225 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-29 04:43:50,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.002e+01 9.674e+01 1.041e+02 1.826e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 04:43:56,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-29 04:44:15,612 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7250, loss[loss=0.06231, simple_loss=0.08159, pruned_loss=0.0124, audio_tagging_loss=0.009109, over 15316.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08982, pruned_loss=0.01199, audio_tagging_loss=0.008815, over 3053143.13 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:44:22,541 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:44:38,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3815960.0, ans=0.125 2023-11-29 04:44:39,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3815960.0, ans=10.0 2023-11-29 04:44:44,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-29 04:44:44,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-29 04:44:48,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-29 04:45:05,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3816093.3333333335, ans=0.035 2023-11-29 04:45:05,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-29 04:45:12,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-29 04:45:18,374 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7300, loss[loss=0.05306, simple_loss=0.07346, pruned_loss=0.008099, audio_tagging_loss=0.00823, over 15076.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08954, pruned_loss=0.01203, audio_tagging_loss=0.00872, over 3044369.39 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:45:22,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2023-11-29 04:45:28,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2023-11-29 04:45:36,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3816226.6666666665, ans=0.125 2023-11-29 04:45:48,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-29 04:45:48,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3816293.3333333335, ans=0.2 2023-11-29 04:45:49,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3816293.3333333335, ans=0.125 2023-11-29 04:45:54,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.998e+01 9.655e+01 1.011e+02 1.283e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 04:46:04,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-29 04:46:19,725 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7350, loss[loss=0.07897, simple_loss=0.1012, pruned_loss=0.01993, audio_tagging_loss=0.008456, over 15402.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09073, pruned_loss=0.01215, audio_tagging_loss=0.008609, over 3049655.40 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:46:45,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3816626.6666666665, ans=0.125 2023-11-29 04:46:50,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-29 04:47:01,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816693.3333333335, ans=0.1 2023-11-29 04:47:20,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3816826.6666666665, ans=0.125 2023-11-29 04:47:21,165 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7400, loss[loss=0.07848, simple_loss=0.1079, pruned_loss=0.01461, audio_tagging_loss=0.009922, over 14525.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09073, pruned_loss=0.01209, audio_tagging_loss=0.008481, over 3051463.51 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:47:26,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3816826.6666666665, ans=0.0 2023-11-29 04:47:38,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3816893.3333333335, ans=0.125 2023-11-29 04:47:51,294 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-29 04:47:56,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3816960.0, ans=0.125 2023-11-29 04:47:56,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.857e+01 9.571e+01 1.032e+02 1.214e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 04:48:03,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3817026.6666666665, ans=0.125 2023-11-29 04:48:14,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-29 04:48:15,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3817093.3333333335, ans=0.2 2023-11-29 04:48:23,984 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7450, loss[loss=0.06386, simple_loss=0.09437, pruned_loss=0.00971, audio_tagging_loss=0.006967, over 14838.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08945, pruned_loss=0.01198, audio_tagging_loss=0.008553, over 3053654.06 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:48:25,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3817160.0, ans=0.2 2023-11-29 04:48:49,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3817293.3333333335, ans=0.025 2023-11-29 04:48:52,808 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-29 04:48:57,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3817293.3333333335, ans=0.0 2023-11-29 04:48:58,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3817293.3333333335, ans=0.0 2023-11-29 04:49:07,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3817360.0, ans=0.0 2023-11-29 04:49:07,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3817360.0, ans=0.2 2023-11-29 04:49:08,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2023-11-29 04:49:10,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817360.0, ans=0.1 2023-11-29 04:49:12,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-29 04:49:25,551 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7500, loss[loss=0.06389, simple_loss=0.08748, pruned_loss=0.01183, audio_tagging_loss=0.008328, over 14928.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08858, pruned_loss=0.01181, audio_tagging_loss=0.008604, over 3055095.47 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:49:28,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3817493.3333333335, ans=0.125 2023-11-29 04:49:31,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3817493.3333333335, ans=0.1 2023-11-29 04:49:54,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.09 vs. limit=22.5 2023-11-29 04:49:56,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-29 04:50:02,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 9.110e+01 9.749e+01 1.048e+02 1.256e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:50:27,261 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7550, loss[loss=0.05955, simple_loss=0.08603, pruned_loss=0.01001, audio_tagging_loss=0.006522, over 15345.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08795, pruned_loss=0.01171, audio_tagging_loss=0.00853, over 3055374.31 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:50:32,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3817826.6666666665, ans=0.125 2023-11-29 04:50:35,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3817826.6666666665, ans=0.2 2023-11-29 04:50:39,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3817893.3333333335, ans=0.125 2023-11-29 04:50:57,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-29 04:51:10,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-29 04:51:13,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3818026.6666666665, ans=0.1 2023-11-29 04:51:17,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3818093.3333333335, ans=0.2 2023-11-29 04:51:29,819 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7600, loss[loss=0.05333, simple_loss=0.0754, pruned_loss=0.008863, audio_tagging_loss=0.006761, over 14884.00 frames. ], tot_loss[loss=0.06381, simple_loss=0.08746, pruned_loss=0.01159, audio_tagging_loss=0.008489, over 3058194.39 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:51:35,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.25 vs. limit=10.0 2023-11-29 04:51:58,831 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-29 04:52:01,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3818293.3333333335, ans=0.125 2023-11-29 04:52:04,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.852e+01 9.526e+01 1.029e+02 1.380e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 04:52:15,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3818360.0, ans=0.05 2023-11-29 04:52:30,896 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7650, loss[loss=0.05659, simple_loss=0.07911, pruned_loss=0.008141, audio_tagging_loss=0.008899, over 14842.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.08726, pruned_loss=0.01159, audio_tagging_loss=0.008574, over 3048885.03 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:52:41,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2023-11-29 04:53:00,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-29 04:53:12,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3818693.3333333335, ans=0.025 2023-11-29 04:53:32,437 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7700, loss[loss=0.07971, simple_loss=0.1066, pruned_loss=0.01742, audio_tagging_loss=0.008967, over 17512.00 frames. ], tot_loss[loss=0.06357, simple_loss=0.08707, pruned_loss=0.01147, audio_tagging_loss=0.008571, over 3054317.58 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:53:35,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3818826.6666666665, ans=0.125 2023-11-29 04:53:35,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2023-11-29 04:53:57,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3818960.0, ans=0.0 2023-11-29 04:53:59,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3818960.0, ans=0.5 2023-11-29 04:54:02,658 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-29 04:54:09,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.101e+01 9.082e+01 9.588e+01 1.045e+02 1.280e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 04:54:23,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3819093.3333333335, ans=0.2 2023-11-29 04:54:34,876 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7750, loss[loss=0.05949, simple_loss=0.07676, pruned_loss=0.01241, audio_tagging_loss=0.0087, over 15226.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08789, pruned_loss=0.01164, audio_tagging_loss=0.008627, over 3057049.33 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:54:36,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3819160.0, ans=0.125 2023-11-29 04:54:38,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.99 vs. limit=15.0 2023-11-29 04:54:41,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819160.0, ans=0.1 2023-11-29 04:54:59,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3819293.3333333335, ans=0.125 2023-11-29 04:55:04,140 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-29 04:55:05,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3819293.3333333335, ans=0.0 2023-11-29 04:55:15,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-29 04:55:19,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=22.5 2023-11-29 04:55:36,099 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7800, loss[loss=0.0674, simple_loss=0.09098, pruned_loss=0.01323, audio_tagging_loss=0.008681, over 15492.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08784, pruned_loss=0.01167, audio_tagging_loss=0.008686, over 3050873.15 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:56:05,436 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-29 04:56:12,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.184e+01 1.003e+02 1.060e+02 1.343e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-29 04:56:18,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-11-29 04:56:21,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3819693.3333333335, ans=0.125 2023-11-29 04:56:27,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3819760.0, ans=0.2 2023-11-29 04:56:37,818 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7850, loss[loss=0.04934, simple_loss=0.06765, pruned_loss=0.007253, audio_tagging_loss=0.008261, over 13492.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08785, pruned_loss=0.01175, audio_tagging_loss=0.008632, over 3050585.69 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:56:51,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2023-11-29 04:57:07,128 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-29 04:57:33,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-29 04:57:39,584 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7900, loss[loss=0.08025, simple_loss=0.1185, pruned_loss=0.01335, audio_tagging_loss=0.00763, over 15524.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08884, pruned_loss=0.01199, audio_tagging_loss=0.00866, over 3051967.11 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:57:49,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3820160.0, ans=0.125 2023-11-29 04:57:50,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3820226.6666666665, ans=0.125 2023-11-29 04:57:58,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3820226.6666666665, ans=0.015 2023-11-29 04:58:09,648 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-29 04:58:16,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.085e+01 9.812e+01 1.049e+02 1.531e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 04:58:16,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3820360.0, ans=0.0 2023-11-29 04:58:19,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3820360.0, ans=0.5 2023-11-29 04:58:24,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3820360.0, ans=0.0 2023-11-29 04:58:34,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3820426.6666666665, ans=0.5 2023-11-29 04:58:41,055 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7950, loss[loss=0.08187, simple_loss=0.1133, pruned_loss=0.01693, audio_tagging_loss=0.008267, over 16095.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08871, pruned_loss=0.01196, audio_tagging_loss=0.008748, over 3049261.03 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:58:46,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3820493.3333333335, ans=0.0 2023-11-29 04:58:49,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3820493.3333333335, ans=15.0 2023-11-29 04:58:56,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3820560.0, ans=0.025 2023-11-29 04:59:00,112 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:59:02,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3820560.0, ans=0.125 2023-11-29 04:59:10,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3820626.6666666665, ans=0.125 2023-11-29 04:59:11,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-29 04:59:11,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3820626.6666666665, ans=0.0 2023-11-29 04:59:14,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-29 04:59:22,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-29 04:59:43,479 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8000, loss[loss=0.0462, simple_loss=0.06237, pruned_loss=0.006429, audio_tagging_loss=0.008589, over 15740.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08801, pruned_loss=0.01179, audio_tagging_loss=0.008805, over 3048331.97 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:59:47,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3820826.6666666665, ans=0.125 2023-11-29 05:00:08,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3820960.0, ans=0.125 2023-11-29 05:00:09,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3820960.0, ans=0.125 2023-11-29 05:00:12,744 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-29 05:00:20,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 9.160e+01 9.620e+01 1.029e+02 4.171e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-29 05:00:36,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3821093.3333333335, ans=0.0 2023-11-29 05:00:38,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-29 05:00:45,125 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8050, loss[loss=0.06378, simple_loss=0.09753, pruned_loss=0.008923, audio_tagging_loss=0.006092, over 13867.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.0879, pruned_loss=0.01178, audio_tagging_loss=0.008799, over 3041372.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:01:10,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-29 05:01:14,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-29 05:01:32,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3821360.0, ans=0.125 2023-11-29 05:01:44,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3821426.6666666665, ans=0.125 2023-11-29 05:01:47,026 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8100, loss[loss=0.07025, simple_loss=0.1027, pruned_loss=0.01103, audio_tagging_loss=0.007878, over 14074.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08841, pruned_loss=0.01177, audio_tagging_loss=0.008687, over 3043229.69 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:01:59,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-29 05:02:00,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3821560.0, ans=0.125 2023-11-29 05:02:02,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3821560.0, ans=0.2 2023-11-29 05:02:16,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-29 05:02:25,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 9.031e+01 9.567e+01 1.056e+02 1.290e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:02:32,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-29 05:02:42,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3821760.0, ans=0.125 2023-11-29 05:02:48,028 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8150, loss[loss=0.08795, simple_loss=0.1239, pruned_loss=0.01963, audio_tagging_loss=0.006354, over 15552.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08888, pruned_loss=0.01185, audio_tagging_loss=0.008432, over 3043417.41 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:03:18,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-29 05:03:43,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3822093.3333333335, ans=0.125 2023-11-29 05:03:50,146 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8200, loss[loss=0.04643, simple_loss=0.05228, pruned_loss=0.005798, audio_tagging_loss=0.01449, over 14687.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08829, pruned_loss=0.01179, audio_tagging_loss=0.008432, over 3045229.82 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:03:54,221 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:04:00,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-11-29 05:04:19,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-29 05:04:27,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.111e+01 9.648e+01 1.058e+02 1.357e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 05:04:48,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3822426.6666666665, ans=0.0 2023-11-29 05:04:51,495 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8250, loss[loss=0.0595, simple_loss=0.08522, pruned_loss=0.00942, audio_tagging_loss=0.007468, over 16431.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.0889, pruned_loss=0.01176, audio_tagging_loss=0.008258, over 3054913.70 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:04:51,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3822493.3333333335, ans=0.0 2023-11-29 05:04:57,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3822493.3333333335, ans=0.125 2023-11-29 05:04:58,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-29 05:05:17,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3822626.6666666665, ans=0.0 2023-11-29 05:05:18,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3822626.6666666665, ans=0.125 2023-11-29 05:05:20,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-29 05:05:44,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3822760.0, ans=0.0 2023-11-29 05:05:52,749 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8300, loss[loss=0.05944, simple_loss=0.08038, pruned_loss=0.008152, audio_tagging_loss=0.0111, over 14192.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08892, pruned_loss=0.01173, audio_tagging_loss=0.008275, over 3057179.03 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:05:52,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3822826.6666666665, ans=0.0 2023-11-29 05:06:03,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3822826.6666666665, ans=0.0 2023-11-29 05:06:23,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-29 05:06:31,401 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.946e+01 9.758e+01 1.060e+02 1.383e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 05:06:36,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3823026.6666666665, ans=0.125 2023-11-29 05:06:36,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3823026.6666666665, ans=0.125 2023-11-29 05:06:38,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3823026.6666666665, ans=0.1 2023-11-29 05:06:44,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3823093.3333333335, ans=0.0 2023-11-29 05:06:46,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.21 vs. limit=10.0 2023-11-29 05:06:52,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-29 05:06:54,995 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8350, loss[loss=0.07021, simple_loss=0.1018, pruned_loss=0.009934, audio_tagging_loss=0.009384, over 15589.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.0889, pruned_loss=0.01165, audio_tagging_loss=0.008245, over 3056234.98 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:07:18,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3823293.3333333335, ans=0.0 2023-11-29 05:07:21,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3823293.3333333335, ans=0.09899494936611666 2023-11-29 05:07:24,366 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-29 05:07:27,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-29 05:07:28,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3823293.3333333335, ans=0.125 2023-11-29 05:07:30,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3823360.0, ans=0.0 2023-11-29 05:07:30,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3823360.0, ans=0.125 2023-11-29 05:07:32,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3823360.0, ans=0.125 2023-11-29 05:07:35,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3823360.0, ans=0.125 2023-11-29 05:07:36,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.66 vs. limit=22.5 2023-11-29 05:07:57,400 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8400, loss[loss=0.04809, simple_loss=0.06135, pruned_loss=0.00555, audio_tagging_loss=0.01186, over 14841.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08932, pruned_loss=0.01184, audio_tagging_loss=0.008213, over 3049089.00 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:08:01,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=22.5 2023-11-29 05:08:02,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-11-29 05:08:10,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-29 05:08:25,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-29 05:08:36,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 9.025e+01 9.772e+01 1.057e+02 1.487e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 05:08:39,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3823693.3333333335, ans=0.125 2023-11-29 05:08:56,892 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8450, loss[loss=0.06618, simple_loss=0.09587, pruned_loss=0.01163, audio_tagging_loss=0.006617, over 16635.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08839, pruned_loss=0.01176, audio_tagging_loss=0.008353, over 3045359.10 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:09:05,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-29 05:09:12,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-11-29 05:09:15,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3823893.3333333335, ans=0.125 2023-11-29 05:09:20,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2023-11-29 05:09:28,166 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-29 05:09:59,984 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8500, loss[loss=0.07017, simple_loss=0.09473, pruned_loss=0.01359, audio_tagging_loss=0.009221, over 14862.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08819, pruned_loss=0.01176, audio_tagging_loss=0.008433, over 3049302.92 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:10:01,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3824160.0, ans=0.0 2023-11-29 05:10:02,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3824160.0, ans=0.2 2023-11-29 05:10:03,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3824160.0, ans=0.125 2023-11-29 05:10:07,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3824160.0, ans=0.0 2023-11-29 05:10:13,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3824226.6666666665, ans=0.0 2023-11-29 05:10:20,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3824226.6666666665, ans=0.125 2023-11-29 05:10:29,779 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-29 05:10:39,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.190e+01 9.692e+01 1.077e+02 1.317e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 05:10:49,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3824426.6666666665, ans=0.0 2023-11-29 05:10:49,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=22.5 2023-11-29 05:10:50,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3824426.6666666665, ans=0.0 2023-11-29 05:11:02,936 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8550, loss[loss=0.05849, simple_loss=0.07865, pruned_loss=0.008309, audio_tagging_loss=0.01086, over 15494.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08877, pruned_loss=0.01177, audio_tagging_loss=0.008528, over 3054638.37 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:11:10,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2023-11-29 05:11:22,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3824560.0, ans=0.125 2023-11-29 05:11:31,508 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-29 05:12:03,551 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8600, loss[loss=0.05164, simple_loss=0.06669, pruned_loss=0.008252, audio_tagging_loss=0.01004, over 16113.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08811, pruned_loss=0.01166, audio_tagging_loss=0.008618, over 3052590.65 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:12:04,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3824826.6666666665, ans=0.125 2023-11-29 05:12:09,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3824826.6666666665, ans=0.1 2023-11-29 05:12:19,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3824893.3333333335, ans=0.0 2023-11-29 05:12:32,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3824960.0, ans=0.0 2023-11-29 05:12:33,546 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-29 05:12:34,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3824960.0, ans=0.125 2023-11-29 05:12:44,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.900e+01 9.530e+01 1.037e+02 1.292e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 05:12:47,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3825026.6666666665, ans=0.2 2023-11-29 05:12:48,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2023-11-29 05:12:50,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3825026.6666666665, ans=0.0 2023-11-29 05:12:51,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3825093.3333333335, ans=0.1 2023-11-29 05:12:56,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3825093.3333333335, ans=0.2 2023-11-29 05:12:59,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-29 05:13:04,708 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8650, loss[loss=0.07897, simple_loss=0.1052, pruned_loss=0.01862, audio_tagging_loss=0.007752, over 15636.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08913, pruned_loss=0.0119, audio_tagging_loss=0.008624, over 3055959.13 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:13:23,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3825226.6666666665, ans=0.125 2023-11-29 05:13:34,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-29 05:14:06,968 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8700, loss[loss=0.06868, simple_loss=0.09558, pruned_loss=0.01327, audio_tagging_loss=0.00762, over 16247.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.0894, pruned_loss=0.01195, audio_tagging_loss=0.008514, over 3061266.06 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:14:14,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3825493.3333333335, ans=0.125 2023-11-29 05:14:18,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3825560.0, ans=0.125 2023-11-29 05:14:19,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3825560.0, ans=0.2 2023-11-29 05:14:27,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3825560.0, ans=0.125 2023-11-29 05:14:28,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3825560.0, ans=0.0 2023-11-29 05:14:29,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3825626.6666666665, ans=0.125 2023-11-29 05:14:36,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-29 05:14:47,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.152e+01 9.894e+01 1.070e+02 1.338e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:14:54,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-29 05:15:00,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3825760.0, ans=0.125 2023-11-29 05:15:08,750 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8750, loss[loss=0.07526, simple_loss=0.1061, pruned_loss=0.01374, audio_tagging_loss=0.008448, over 15749.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08909, pruned_loss=0.01183, audio_tagging_loss=0.008683, over 3050990.77 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:15:23,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3825893.3333333335, ans=0.125 2023-11-29 05:15:27,069 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:15:37,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-29 05:16:10,218 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8800, loss[loss=0.0615, simple_loss=0.08272, pruned_loss=0.01002, audio_tagging_loss=0.01013, over 15301.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09004, pruned_loss=0.0121, audio_tagging_loss=0.00876, over 3050212.83 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:16:14,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3826160.0, ans=0.1 2023-11-29 05:16:15,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3826160.0, ans=0.2 2023-11-29 05:16:39,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-29 05:16:50,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.120e+01 9.746e+01 1.050e+02 1.300e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 05:16:56,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3826360.0, ans=0.2 2023-11-29 05:17:11,274 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8850, loss[loss=0.08666, simple_loss=0.1149, pruned_loss=0.02206, audio_tagging_loss=0.007154, over 14444.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08964, pruned_loss=0.01213, audio_tagging_loss=0.008776, over 3050826.82 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:17:14,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-11-29 05:17:26,534 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:17:40,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-29 05:17:54,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3826693.3333333335, ans=0.0 2023-11-29 05:18:00,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3826760.0, ans=0.125 2023-11-29 05:18:06,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3826760.0, ans=0.2 2023-11-29 05:18:13,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826826.6666666665, ans=0.1 2023-11-29 05:18:14,006 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8900, loss[loss=0.05068, simple_loss=0.07157, pruned_loss=0.009128, audio_tagging_loss=0.005763, over 14584.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09031, pruned_loss=0.01215, audio_tagging_loss=0.008651, over 3047603.36 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:18:14,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3826826.6666666665, ans=0.125 2023-11-29 05:18:14,283 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:18:43,699 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-29 05:18:50,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827026.6666666665, ans=0.1 2023-11-29 05:18:54,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.202e+01 9.774e+01 1.025e+02 3.343e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-29 05:18:56,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3827026.6666666665, ans=0.125 2023-11-29 05:18:58,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3827026.6666666665, ans=0.125 2023-11-29 05:19:05,132 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:19:11,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2023-11-29 05:19:15,220 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8950, loss[loss=0.06272, simple_loss=0.08729, pruned_loss=0.01162, audio_tagging_loss=0.007456, over 14155.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08984, pruned_loss=0.01203, audio_tagging_loss=0.008495, over 3044054.59 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:19:25,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3827160.0, ans=0.0 2023-11-29 05:19:37,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827226.6666666665, ans=0.1 2023-11-29 05:19:44,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827293.3333333335, ans=0.1 2023-11-29 05:19:45,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-29 05:20:00,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3827360.0, ans=0.125 2023-11-29 05:20:01,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3827360.0, ans=0.0 2023-11-29 05:20:15,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2023-11-29 05:20:16,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.17 vs. limit=5.0 2023-11-29 05:20:17,580 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9000, loss[loss=0.06823, simple_loss=0.08302, pruned_loss=0.01775, audio_tagging_loss=0.008971, over 14196.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08952, pruned_loss=0.01205, audio_tagging_loss=0.008423, over 3047485.02 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:20:17,581 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 05:20:56,935 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05922, simple_loss=0.05036, pruned_loss=0.00529, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-29 05:20:56,936 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 05:20:59,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3827493.3333333335, ans=0.0 2023-11-29 05:21:01,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-29 05:21:04,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3827493.3333333335, ans=0.125 2023-11-29 05:21:04,238 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:21:10,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3827560.0, ans=0.125 2023-11-29 05:21:26,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-29 05:21:29,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3827626.6666666665, ans=0.125 2023-11-29 05:21:37,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 9.287e+01 9.829e+01 1.058e+02 1.335e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:21:43,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3827693.3333333335, ans=0.0 2023-11-29 05:21:58,577 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9050, loss[loss=0.07383, simple_loss=0.103, pruned_loss=0.01507, audio_tagging_loss=0.007285, over 14980.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08942, pruned_loss=0.01212, audio_tagging_loss=0.008372, over 3044078.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:22:04,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3827826.6666666665, ans=0.1 2023-11-29 05:22:27,882 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-29 05:22:51,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3828093.3333333335, ans=0.2 2023-11-29 05:23:00,521 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9100, loss[loss=0.07704, simple_loss=0.1066, pruned_loss=0.01721, audio_tagging_loss=0.00655, over 15482.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09051, pruned_loss=0.01226, audio_tagging_loss=0.008226, over 3050102.90 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:23:02,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=12.0 2023-11-29 05:23:14,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3828226.6666666665, ans=0.0 2023-11-29 05:23:23,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3828293.3333333335, ans=0.0 2023-11-29 05:23:29,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-29 05:23:29,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3828293.3333333335, ans=0.125 2023-11-29 05:23:38,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3828360.0, ans=0.0 2023-11-29 05:23:40,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.515e+01 1.034e+02 1.309e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 05:24:01,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-29 05:24:02,089 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9150, loss[loss=0.0614, simple_loss=0.07095, pruned_loss=0.01331, audio_tagging_loss=0.01261, over 14399.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09067, pruned_loss=0.01219, audio_tagging_loss=0.008168, over 3054692.20 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:24:19,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2023-11-29 05:24:31,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3828626.6666666665, ans=0.125 2023-11-29 05:24:32,031 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-29 05:24:32,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3828626.6666666665, ans=0.125 2023-11-29 05:25:01,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:25:04,007 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9200, loss[loss=0.06635, simple_loss=0.09918, pruned_loss=0.009343, audio_tagging_loss=0.007418, over 15228.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.09012, pruned_loss=0.01211, audio_tagging_loss=0.00822, over 3054863.97 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:25:06,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3828826.6666666665, ans=0.07 2023-11-29 05:25:07,229 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:25:12,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3828826.6666666665, ans=0.2 2023-11-29 05:25:25,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3828893.3333333335, ans=0.0 2023-11-29 05:25:33,883 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-29 05:25:44,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 8.927e+01 9.501e+01 1.029e+02 1.392e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 05:25:48,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3829026.6666666665, ans=0.05 2023-11-29 05:26:03,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2023-11-29 05:26:06,021 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9250, loss[loss=0.09272, simple_loss=0.1333, pruned_loss=0.01918, audio_tagging_loss=0.006866, over 14892.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0904, pruned_loss=0.0121, audio_tagging_loss=0.008225, over 3054296.62 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:26:10,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829160.0, ans=0.1 2023-11-29 05:26:29,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3829293.3333333335, ans=0.125 2023-11-29 05:26:35,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-29 05:26:52,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3829360.0, ans=0.125 2023-11-29 05:27:08,211 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9300, loss[loss=0.0813, simple_loss=0.1158, pruned_loss=0.01537, audio_tagging_loss=0.008034, over 16505.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.09005, pruned_loss=0.01191, audio_tagging_loss=0.008318, over 3064590.10 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:27:30,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3829560.0, ans=0.125 2023-11-29 05:27:36,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3829626.6666666665, ans=0.025 2023-11-29 05:27:37,433 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-29 05:27:51,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.967e+01 9.597e+01 1.017e+02 1.229e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-29 05:27:54,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2023-11-29 05:27:57,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-29 05:28:00,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3829760.0, ans=0.125 2023-11-29 05:28:01,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3829760.0, ans=0.0 2023-11-29 05:28:09,016 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9350, loss[loss=0.06118, simple_loss=0.08565, pruned_loss=0.01029, audio_tagging_loss=0.008067, over 15349.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08982, pruned_loss=0.01202, audio_tagging_loss=0.008372, over 3055860.80 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:28:24,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3829893.3333333335, ans=0.1 2023-11-29 05:28:39,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-29 05:28:45,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3830026.6666666665, ans=0.0 2023-11-29 05:28:59,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3830093.3333333335, ans=0.0 2023-11-29 05:29:06,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3830093.3333333335, ans=0.125 2023-11-29 05:29:07,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3830093.3333333335, ans=0.1 2023-11-29 05:29:10,054 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9400, loss[loss=0.0577, simple_loss=0.07165, pruned_loss=0.01159, audio_tagging_loss=0.01029, over 14273.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08961, pruned_loss=0.01195, audio_tagging_loss=0.008438, over 3053767.09 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:29:39,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-29 05:29:53,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 9.033e+01 9.709e+01 1.034e+02 1.178e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 05:29:54,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3830360.0, ans=0.125 2023-11-29 05:30:00,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-11-29 05:30:02,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3830426.6666666665, ans=0.0 2023-11-29 05:30:12,139 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9450, loss[loss=0.05774, simple_loss=0.08036, pruned_loss=0.0106, audio_tagging_loss=0.006965, over 14567.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08972, pruned_loss=0.01205, audio_tagging_loss=0.008538, over 3059443.87 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:30:13,319 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:30:41,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-29 05:30:58,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3830693.3333333335, ans=0.125 2023-11-29 05:31:13,402 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9500, loss[loss=0.06551, simple_loss=0.08661, pruned_loss=0.01467, audio_tagging_loss=0.007536, over 14979.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09047, pruned_loss=0.0122, audio_tagging_loss=0.008512, over 3058782.65 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:31:15,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-29 05:31:44,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-29 05:31:46,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3830960.0, ans=0.09899494936611666 2023-11-29 05:31:56,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.911e+01 9.563e+01 1.027e+02 1.260e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:32:04,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3831093.3333333335, ans=0.125 2023-11-29 05:32:09,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3831093.3333333335, ans=0.125 2023-11-29 05:32:09,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2023-11-29 05:32:15,816 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9550, loss[loss=0.0578, simple_loss=0.07506, pruned_loss=0.01176, audio_tagging_loss=0.008512, over 14672.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09045, pruned_loss=0.01216, audio_tagging_loss=0.00864, over 3054539.52 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:32:16,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3831160.0, ans=0.0 2023-11-29 05:32:21,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3831160.0, ans=0.125 2023-11-29 05:32:29,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-29 05:32:44,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3831293.3333333335, ans=0.0 2023-11-29 05:32:44,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-29 05:33:01,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831360.0, ans=0.1 2023-11-29 05:33:05,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3831426.6666666665, ans=0.0 2023-11-29 05:33:17,848 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9600, loss[loss=0.06475, simple_loss=0.08174, pruned_loss=0.01367, audio_tagging_loss=0.0102, over 12934.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09007, pruned_loss=0.01202, audio_tagging_loss=0.008702, over 3047370.81 frames. ], batch size: 51, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:33:19,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3831493.3333333335, ans=0.125 2023-11-29 05:33:42,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3831626.6666666665, ans=0.125 2023-11-29 05:33:45,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3831626.6666666665, ans=0.1 2023-11-29 05:33:46,173 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-29 05:33:58,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3831693.3333333335, ans=0.04949747468305833 2023-11-29 05:34:01,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.154e+01 9.787e+01 1.038e+02 1.402e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 05:34:04,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3831693.3333333335, ans=0.1 2023-11-29 05:34:18,765 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9650, loss[loss=0.06302, simple_loss=0.08013, pruned_loss=0.01357, audio_tagging_loss=0.009384, over 15459.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08963, pruned_loss=0.01205, audio_tagging_loss=0.008647, over 3044267.55 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:34:33,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-29 05:34:34,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3831893.3333333335, ans=0.125 2023-11-29 05:34:37,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-29 05:34:50,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-29 05:35:19,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-29 05:35:20,960 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9700, loss[loss=0.07173, simple_loss=0.1022, pruned_loss=0.01177, audio_tagging_loss=0.00884, over 15261.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08891, pruned_loss=0.01191, audio_tagging_loss=0.008605, over 3041142.08 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:35:32,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3832226.6666666665, ans=0.0 2023-11-29 05:35:40,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-29 05:35:43,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3832226.6666666665, ans=0.0 2023-11-29 05:35:46,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-29 05:35:50,788 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-29 05:35:51,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3832293.3333333335, ans=0.125 2023-11-29 05:36:01,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3832360.0, ans=0.125 2023-11-29 05:36:03,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-29 05:36:03,815 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.065e+01 9.811e+01 1.054e+02 1.349e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 05:36:23,090 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9750, loss[loss=0.05251, simple_loss=0.06815, pruned_loss=0.01055, audio_tagging_loss=0.007884, over 15631.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.0881, pruned_loss=0.01186, audio_tagging_loss=0.008545, over 3039778.98 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:36:24,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3832493.3333333335, ans=0.125 2023-11-29 05:36:31,440 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:36:41,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-29 05:36:46,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3832626.6666666665, ans=0.07 2023-11-29 05:36:50,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3832626.6666666665, ans=0.1 2023-11-29 05:36:51,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-29 05:37:01,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3832693.3333333335, ans=0.0 2023-11-29 05:37:21,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3832760.0, ans=0.2 2023-11-29 05:37:21,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3832760.0, ans=0.0 2023-11-29 05:37:21,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3832760.0, ans=0.0 2023-11-29 05:37:23,670 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9800, loss[loss=0.0518, simple_loss=0.07291, pruned_loss=0.007287, audio_tagging_loss=0.008058, over 17378.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08784, pruned_loss=0.01174, audio_tagging_loss=0.008461, over 3037326.04 frames. ], batch size: 66, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:37:24,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3832826.6666666665, ans=0.0 2023-11-29 05:37:39,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3832893.3333333335, ans=0.2 2023-11-29 05:37:50,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3832960.0, ans=0.2 2023-11-29 05:37:52,472 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-29 05:37:55,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3832960.0, ans=0.125 2023-11-29 05:38:05,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 9.329e+01 9.816e+01 1.069e+02 1.352e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 05:38:07,108 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:38:09,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3833026.6666666665, ans=0.0 2023-11-29 05:38:20,042 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:38:23,361 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9850, loss[loss=0.06157, simple_loss=0.09318, pruned_loss=0.008369, audio_tagging_loss=0.006618, over 15106.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.0881, pruned_loss=0.01181, audio_tagging_loss=0.008366, over 3036347.19 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:38:34,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-29 05:38:53,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-29 05:39:14,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-29 05:39:20,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3833426.6666666665, ans=0.2 2023-11-29 05:39:24,249 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9900, loss[loss=0.06082, simple_loss=0.08286, pruned_loss=0.01201, audio_tagging_loss=0.007383, over 15167.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08846, pruned_loss=0.01183, audio_tagging_loss=0.008233, over 3037223.53 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:39:27,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833493.3333333335, ans=0.1 2023-11-29 05:39:31,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833493.3333333335, ans=0.1 2023-11-29 05:39:32,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833493.3333333335, ans=0.1 2023-11-29 05:39:36,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.51 vs. limit=6.0 2023-11-29 05:39:41,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:45,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3833560.0, ans=0.125 2023-11-29 05:39:53,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-29 05:39:56,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3833626.6666666665, ans=0.125 2023-11-29 05:40:06,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.333e+01 9.894e+01 1.049e+02 1.495e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:40:10,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3833693.3333333335, ans=0.125 2023-11-29 05:40:17,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-29 05:40:23,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-29 05:40:25,382 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9950, loss[loss=0.08485, simple_loss=0.1248, pruned_loss=0.01242, audio_tagging_loss=0.01004, over 16324.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08847, pruned_loss=0.01179, audio_tagging_loss=0.00826, over 3042439.26 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:40:25,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-29 05:40:29,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3833826.6666666665, ans=0.0 2023-11-29 05:40:45,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3833893.3333333335, ans=0.125 2023-11-29 05:40:53,863 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-29 05:40:56,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3833960.0, ans=0.125 2023-11-29 05:41:03,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-29 05:41:17,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=22.5 2023-11-29 05:41:25,623 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10000, loss[loss=0.04802, simple_loss=0.06337, pruned_loss=0.004902, audio_tagging_loss=0.01143, over 14802.00 frames. ], tot_loss[loss=0.06391, simple_loss=0.088, pruned_loss=0.01163, audio_tagging_loss=0.008281, over 3044804.47 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:41:42,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834226.6666666665, ans=0.1 2023-11-29 05:41:55,762 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-29 05:41:58,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3834293.3333333335, ans=0.0 2023-11-29 05:42:08,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.882e+01 9.475e+01 1.008e+02 1.351e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-29 05:42:20,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3834426.6666666665, ans=0.0 2023-11-29 05:42:26,229 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10050, loss[loss=0.07714, simple_loss=0.107, pruned_loss=0.01299, audio_tagging_loss=0.01063, over 16627.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08837, pruned_loss=0.01171, audio_tagging_loss=0.008394, over 3049362.92 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:42:26,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3834493.3333333335, ans=0.0 2023-11-29 05:42:45,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3834560.0, ans=0.0 2023-11-29 05:42:53,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3834626.6666666665, ans=0.1 2023-11-29 05:42:55,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-29 05:43:03,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3834693.3333333335, ans=0.2 2023-11-29 05:43:17,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3834760.0, ans=0.125 2023-11-29 05:43:28,465 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10100, loss[loss=0.06402, simple_loss=0.08952, pruned_loss=0.01292, audio_tagging_loss=0.006336, over 15281.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.0887, pruned_loss=0.01184, audio_tagging_loss=0.008436, over 3046902.81 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:43:30,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3834826.6666666665, ans=0.0 2023-11-29 05:43:33,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3834826.6666666665, ans=0.2 2023-11-29 05:43:40,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3834893.3333333335, ans=0.05 2023-11-29 05:43:41,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834893.3333333335, ans=0.1 2023-11-29 05:43:48,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3834893.3333333335, ans=0.0 2023-11-29 05:43:56,890 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-29 05:44:08,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3835026.6666666665, ans=0.0 2023-11-29 05:44:10,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 9.243e+01 9.943e+01 1.062e+02 1.322e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-29 05:44:19,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3835093.3333333335, ans=0.125 2023-11-29 05:44:20,557 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:44:28,565 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10150, loss[loss=0.04565, simple_loss=0.06283, pruned_loss=0.00495, audio_tagging_loss=0.009285, over 15182.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08943, pruned_loss=0.01189, audio_tagging_loss=0.008514, over 3049025.73 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:44:57,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-29 05:45:00,026 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:03,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3835293.3333333335, ans=0.0 2023-11-29 05:45:24,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3835426.6666666665, ans=0.2 2023-11-29 05:45:25,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-29 05:45:26,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3835426.6666666665, ans=0.125 2023-11-29 05:45:28,652 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10200, loss[loss=0.08317, simple_loss=0.1182, pruned_loss=0.01651, audio_tagging_loss=0.007572, over 15689.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08944, pruned_loss=0.01184, audio_tagging_loss=0.008571, over 3050340.58 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:45:38,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3835493.3333333335, ans=0.125 2023-11-29 05:45:39,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3835493.3333333335, ans=0.0 2023-11-29 05:45:54,727 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:58,177 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-29 05:46:10,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835693.3333333335, ans=0.1 2023-11-29 05:46:12,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 8.881e+01 9.698e+01 1.024e+02 1.374e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 05:46:15,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3835693.3333333335, ans=0.125 2023-11-29 05:46:27,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3835760.0, ans=0.125 2023-11-29 05:46:27,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3835760.0, ans=0.2 2023-11-29 05:46:29,776 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10250, loss[loss=0.04793, simple_loss=0.05851, pruned_loss=0.009345, audio_tagging_loss=0.009326, over 17322.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08961, pruned_loss=0.01197, audio_tagging_loss=0.008593, over 3049669.25 frames. ], batch size: 68, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:46:34,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3835826.6666666665, ans=0.125 2023-11-29 05:46:45,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-29 05:46:53,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-11-29 05:46:54,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3835960.0, ans=0.125 2023-11-29 05:46:58,953 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-29 05:47:22,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3836093.3333333335, ans=0.035 2023-11-29 05:47:27,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3836093.3333333335, ans=0.125 2023-11-29 05:47:27,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3836093.3333333335, ans=0.125 2023-11-29 05:47:30,873 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10300, loss[loss=0.06405, simple_loss=0.07958, pruned_loss=0.0136, audio_tagging_loss=0.01066, over 14877.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08851, pruned_loss=0.01192, audio_tagging_loss=0.008663, over 3048846.65 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:47:40,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3836160.0, ans=0.125 2023-11-29 05:47:44,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-29 05:48:00,368 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-29 05:48:00,514 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:48:14,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.198e+01 9.831e+01 1.050e+02 1.376e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:48:31,801 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10350, loss[loss=0.06464, simple_loss=0.08751, pruned_loss=0.009837, audio_tagging_loss=0.01105, over 15024.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08972, pruned_loss=0.01209, audio_tagging_loss=0.008735, over 3051503.26 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:48:45,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3836560.0, ans=0.125 2023-11-29 05:48:49,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3836560.0, ans=0.0 2023-11-29 05:48:52,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3836560.0, ans=0.1 2023-11-29 05:48:58,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3836626.6666666665, ans=0.0 2023-11-29 05:49:01,367 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-29 05:49:13,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3836693.3333333335, ans=0.0 2023-11-29 05:49:31,938 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10400, loss[loss=0.05738, simple_loss=0.0743, pruned_loss=0.008808, audio_tagging_loss=0.01142, over 14588.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08921, pruned_loss=0.01195, audio_tagging_loss=0.00885, over 3051822.66 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:49:48,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-29 05:49:57,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3836960.0, ans=0.1 2023-11-29 05:50:00,073 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:50:00,919 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-29 05:50:01,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3836960.0, ans=0.125 2023-11-29 05:50:12,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=12.0 2023-11-29 05:50:15,794 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.181e+01 9.898e+01 1.077e+02 1.252e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-29 05:50:27,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3837093.3333333335, ans=0.125 2023-11-29 05:50:32,037 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10450, loss[loss=0.05262, simple_loss=0.06937, pruned_loss=0.009961, audio_tagging_loss=0.007974, over 15224.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08906, pruned_loss=0.01178, audio_tagging_loss=0.00881, over 3055788.90 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:50:39,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3837160.0, ans=0.125 2023-11-29 05:51:02,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-29 05:51:18,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3837360.0, ans=0.125 2023-11-29 05:51:33,372 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10500, loss[loss=0.05147, simple_loss=0.0738, pruned_loss=0.004287, audio_tagging_loss=0.01029, over 14632.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08827, pruned_loss=0.01169, audio_tagging_loss=0.008631, over 3053949.06 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:51:34,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3837493.3333333335, ans=0.05 2023-11-29 05:51:45,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3837560.0, ans=0.0 2023-11-29 05:51:47,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3837560.0, ans=0.125 2023-11-29 05:51:49,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-29 05:52:01,908 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-29 05:52:16,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 9.059e+01 9.610e+01 1.014e+02 2.042e+02, threshold=1.922e+02, percent-clipped=1.0 2023-11-29 05:52:22,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837760.0, ans=0.1 2023-11-29 05:52:25,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3837760.0, ans=0.1 2023-11-29 05:52:27,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3837760.0, ans=0.125 2023-11-29 05:52:33,993 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10550, loss[loss=0.06109, simple_loss=0.07488, pruned_loss=0.0133, audio_tagging_loss=0.01035, over 15851.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08832, pruned_loss=0.01173, audio_tagging_loss=0.008563, over 3048102.12 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:52:54,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-29 05:52:58,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3837960.0, ans=0.125 2023-11-29 05:53:02,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3837960.0, ans=0.2 2023-11-29 05:53:03,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-29 05:53:16,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-11-29 05:53:18,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-29 05:53:21,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2023-11-29 05:53:27,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3838093.3333333335, ans=0.125 2023-11-29 05:53:34,001 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10600, loss[loss=0.06505, simple_loss=0.09697, pruned_loss=0.0115, audio_tagging_loss=0.005067, over 15820.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08881, pruned_loss=0.01174, audio_tagging_loss=0.008417, over 3045437.37 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:53:47,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3838226.6666666665, ans=0.1 2023-11-29 05:54:04,192 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-29 05:54:11,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-29 05:54:14,694 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:54:18,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.196e+01 9.659e+01 1.047e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 05:54:20,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3838360.0, ans=0.015 2023-11-29 05:54:20,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3838360.0, ans=0.125 2023-11-29 05:54:34,916 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10650, loss[loss=0.07231, simple_loss=0.09407, pruned_loss=0.01353, audio_tagging_loss=0.01176, over 13831.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08862, pruned_loss=0.01186, audio_tagging_loss=0.008447, over 3041554.08 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:54:35,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-29 05:54:50,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3838560.0, ans=0.125 2023-11-29 05:55:01,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3838626.6666666665, ans=0.125 2023-11-29 05:55:02,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3838626.6666666665, ans=0.1 2023-11-29 05:55:03,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-29 05:55:11,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-29 05:55:36,148 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10700, loss[loss=0.05433, simple_loss=0.07267, pruned_loss=0.009846, audio_tagging_loss=0.00815, over 15431.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08825, pruned_loss=0.01186, audio_tagging_loss=0.008524, over 3037786.64 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:55:52,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3838893.3333333335, ans=0.125 2023-11-29 05:56:04,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-29 05:56:21,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.915e+01 9.906e+01 1.068e+02 1.666e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 05:56:23,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3839093.3333333335, ans=0.125 2023-11-29 05:56:30,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3839093.3333333335, ans=0.125 2023-11-29 05:56:35,678 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10750, loss[loss=0.06021, simple_loss=0.07666, pruned_loss=0.01067, audio_tagging_loss=0.01122, over 16126.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08901, pruned_loss=0.01196, audio_tagging_loss=0.008546, over 3044655.50 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:56:37,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-29 05:56:40,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3839160.0, ans=0.2 2023-11-29 05:57:00,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3839293.3333333335, ans=0.0 2023-11-29 05:57:05,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-29 05:57:14,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3839360.0, ans=15.0 2023-11-29 05:57:15,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3839360.0, ans=0.0 2023-11-29 05:57:17,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3839360.0, ans=0.1 2023-11-29 05:57:36,389 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10800, loss[loss=0.05444, simple_loss=0.07815, pruned_loss=0.00683, audio_tagging_loss=0.008532, over 15373.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08865, pruned_loss=0.01184, audio_tagging_loss=0.00854, over 3042879.58 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:57:39,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3839493.3333333335, ans=0.0 2023-11-29 05:57:46,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3839493.3333333335, ans=0.125 2023-11-29 05:57:46,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3839493.3333333335, ans=0.05 2023-11-29 05:57:56,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3839560.0, ans=0.1 2023-11-29 05:57:57,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3839560.0, ans=0.1 2023-11-29 05:58:01,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3839626.6666666665, ans=0.1 2023-11-29 05:58:04,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-29 05:58:10,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3839693.3333333335, ans=0.125 2023-11-29 05:58:12,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.31 vs. limit=12.0 2023-11-29 05:58:12,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3839693.3333333335, ans=0.125 2023-11-29 05:58:17,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3839693.3333333335, ans=0.125 2023-11-29 05:58:21,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.991e+01 9.894e+01 1.056e+02 1.415e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:58:37,120 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10850, loss[loss=0.05418, simple_loss=0.07148, pruned_loss=0.01044, audio_tagging_loss=0.008003, over 15108.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08827, pruned_loss=0.01175, audio_tagging_loss=0.008543, over 3044176.59 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:58:47,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3839893.3333333335, ans=0.125 2023-11-29 05:59:05,486 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-29 05:59:23,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3840026.6666666665, ans=0.2 2023-11-29 05:59:39,106 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:59:40,186 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10900, loss[loss=0.06245, simple_loss=0.07994, pruned_loss=0.01155, audio_tagging_loss=0.01093, over 15196.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08876, pruned_loss=0.01179, audio_tagging_loss=0.008509, over 3043636.56 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:59:49,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3840160.0, ans=0.0 2023-11-29 06:00:07,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3840293.3333333335, ans=0.0 2023-11-29 06:00:09,742 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-29 06:00:27,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.980e+01 9.608e+01 1.052e+02 1.228e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 06:00:41,328 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10950, loss[loss=0.07755, simple_loss=0.1088, pruned_loss=0.01448, audio_tagging_loss=0.008696, over 14788.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08913, pruned_loss=0.01189, audio_tagging_loss=0.008507, over 3039498.74 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:00:50,472 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:00:54,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3840560.0, ans=0.125 2023-11-29 06:00:57,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3840560.0, ans=10.0 2023-11-29 06:01:12,287 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-29 06:01:12,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3840626.6666666665, ans=0.125 2023-11-29 06:01:13,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3840626.6666666665, ans=0.2 2023-11-29 06:01:17,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3840626.6666666665, ans=0.0 2023-11-29 06:01:25,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3840693.3333333335, ans=0.2 2023-11-29 06:01:36,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840760.0, ans=0.1 2023-11-29 06:01:43,987 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11000, loss[loss=0.05717, simple_loss=0.07366, pruned_loss=0.008402, audio_tagging_loss=0.01194, over 14940.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08952, pruned_loss=0.01186, audio_tagging_loss=0.00859, over 3048226.52 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:01:47,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3840826.6666666665, ans=0.125 2023-11-29 06:01:56,953 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:01:57,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3840893.3333333335, ans=0.2 2023-11-29 06:02:13,336 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-29 06:02:24,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3841026.6666666665, ans=0.035 2023-11-29 06:02:30,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.865e+01 9.537e+01 1.028e+02 1.366e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 06:02:38,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3841093.3333333335, ans=0.125 2023-11-29 06:02:45,430 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11050, loss[loss=0.04938, simple_loss=0.06412, pruned_loss=0.008792, audio_tagging_loss=0.008524, over 15332.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08853, pruned_loss=0.01177, audio_tagging_loss=0.008638, over 3045766.62 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:02:52,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3841160.0, ans=0.1 2023-11-29 06:02:55,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3841160.0, ans=0.0 2023-11-29 06:02:56,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3841226.6666666665, ans=0.0 2023-11-29 06:03:05,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3841226.6666666665, ans=0.0 2023-11-29 06:03:14,274 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-29 06:03:15,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3841293.3333333335, ans=0.125 2023-11-29 06:03:37,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3841426.6666666665, ans=0.0 2023-11-29 06:03:39,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3841426.6666666665, ans=0.125 2023-11-29 06:03:43,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3841426.6666666665, ans=0.125 2023-11-29 06:03:47,126 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11100, loss[loss=0.07854, simple_loss=0.1114, pruned_loss=0.01364, audio_tagging_loss=0.009174, over 15096.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.0893, pruned_loss=0.01181, audio_tagging_loss=0.008694, over 3049267.68 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:03:52,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3841493.3333333335, ans=0.125 2023-11-29 06:04:17,525 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-29 06:04:23,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3841693.3333333335, ans=0.125 2023-11-29 06:04:33,961 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.155e+01 9.764e+01 1.030e+02 1.216e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 06:04:41,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.49 vs. limit=10.0 2023-11-29 06:04:48,695 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11150, loss[loss=0.07833, simple_loss=0.1098, pruned_loss=0.01361, audio_tagging_loss=0.009823, over 15883.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08927, pruned_loss=0.01181, audio_tagging_loss=0.008776, over 3052669.32 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:05:03,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3841893.3333333335, ans=0.125 2023-11-29 06:05:18,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-29 06:05:22,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3841960.0, ans=0.125 2023-11-29 06:05:23,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3841960.0, ans=10.0 2023-11-29 06:05:40,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3842093.3333333335, ans=0.2 2023-11-29 06:05:51,027 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11200, loss[loss=0.0614, simple_loss=0.08604, pruned_loss=0.009987, audio_tagging_loss=0.008393, over 15380.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08933, pruned_loss=0.01176, audio_tagging_loss=0.008805, over 3048846.51 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:05:56,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-29 06:05:57,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2023-11-29 06:06:17,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3842293.3333333335, ans=0.07 2023-11-29 06:06:19,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-29 06:06:37,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.953e+01 9.671e+01 1.033e+02 1.680e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:06:43,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3842426.6666666665, ans=0.125 2023-11-29 06:06:47,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3842426.6666666665, ans=10.0 2023-11-29 06:06:48,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2023-11-29 06:06:51,544 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11250, loss[loss=0.0726, simple_loss=0.1067, pruned_loss=0.012, audio_tagging_loss=0.007237, over 14769.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08937, pruned_loss=0.01189, audio_tagging_loss=0.008812, over 3049722.61 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:07:20,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3842626.6666666665, ans=0.1 2023-11-29 06:07:21,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-29 06:07:37,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3842693.3333333335, ans=0.125 2023-11-29 06:07:52,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3842826.6666666665, ans=0.2 2023-11-29 06:07:53,059 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11300, loss[loss=0.07129, simple_loss=0.09903, pruned_loss=0.01503, audio_tagging_loss=0.006745, over 16681.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08905, pruned_loss=0.01202, audio_tagging_loss=0.008745, over 3053863.68 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:08:11,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3842893.3333333335, ans=0.0 2023-11-29 06:08:12,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3842893.3333333335, ans=0.0 2023-11-29 06:08:15,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3842893.3333333335, ans=0.0 2023-11-29 06:08:19,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3842960.0, ans=0.07 2023-11-29 06:08:23,074 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-29 06:08:42,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.108e+01 9.647e+01 1.054e+02 1.325e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 06:08:55,253 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11350, loss[loss=0.0739, simple_loss=0.09917, pruned_loss=0.01545, audio_tagging_loss=0.008867, over 15756.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08908, pruned_loss=0.01215, audio_tagging_loss=0.008658, over 3047129.70 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:09:04,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3843160.0, ans=0.0 2023-11-29 06:09:14,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3843226.6666666665, ans=0.125 2023-11-29 06:09:24,760 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-29 06:09:56,516 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11400, loss[loss=0.05957, simple_loss=0.07605, pruned_loss=0.00839, audio_tagging_loss=0.01315, over 15617.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08973, pruned_loss=0.0123, audio_tagging_loss=0.008552, over 3042010.50 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:10:07,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3843560.0, ans=0.125 2023-11-29 06:10:10,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3843560.0, ans=0.125 2023-11-29 06:10:14,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3843560.0, ans=0.125 2023-11-29 06:10:20,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=8.0 2023-11-29 06:10:26,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-29 06:10:26,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-29 06:10:30,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3843626.6666666665, ans=0.125 2023-11-29 06:10:43,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-11-29 06:10:45,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 9.236e+01 1.000e+02 1.071e+02 1.321e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-29 06:10:55,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3843760.0, ans=0.125 2023-11-29 06:10:57,916 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11450, loss[loss=0.06719, simple_loss=0.09747, pruned_loss=0.01122, audio_tagging_loss=0.007233, over 15354.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0907, pruned_loss=0.01225, audio_tagging_loss=0.008445, over 3049990.11 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:11:28,123 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-29 06:11:28,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3843960.0, ans=0.125 2023-11-29 06:11:43,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3844026.6666666665, ans=0.125 2023-11-29 06:11:44,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3844026.6666666665, ans=0.125 2023-11-29 06:11:57,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3844093.3333333335, ans=0.125 2023-11-29 06:11:58,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3844093.3333333335, ans=0.125 2023-11-29 06:12:00,433 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11500, loss[loss=0.06686, simple_loss=0.09132, pruned_loss=0.01271, audio_tagging_loss=0.008491, over 15193.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09023, pruned_loss=0.01211, audio_tagging_loss=0.008495, over 3049833.05 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:12:04,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3844160.0, ans=0.07 2023-11-29 06:12:16,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3844226.6666666665, ans=0.125 2023-11-29 06:12:16,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-29 06:12:21,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3844226.6666666665, ans=0.125 2023-11-29 06:12:30,141 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-29 06:12:50,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.910e+01 9.568e+01 1.052e+02 1.642e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 06:13:02,170 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11550, loss[loss=0.0509, simple_loss=0.06333, pruned_loss=0.009338, audio_tagging_loss=0.009895, over 14111.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08886, pruned_loss=0.01181, audio_tagging_loss=0.008565, over 3047704.47 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:13:10,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3844493.3333333335, ans=0.1 2023-11-29 06:13:32,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-29 06:13:38,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-29 06:13:42,660 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:13:54,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3844760.0, ans=0.0 2023-11-29 06:13:54,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-29 06:14:01,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3844760.0, ans=0.1 2023-11-29 06:14:03,737 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11600, loss[loss=0.06685, simple_loss=0.08867, pruned_loss=0.01272, audio_tagging_loss=0.009801, over 14799.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.0894, pruned_loss=0.01179, audio_tagging_loss=0.008518, over 3045626.11 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:14:05,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3844826.6666666665, ans=0.125 2023-11-29 06:14:15,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3844893.3333333335, ans=0.125 2023-11-29 06:14:26,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3844960.0, ans=0.0 2023-11-29 06:14:32,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-29 06:14:48,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3845026.6666666665, ans=0.125 2023-11-29 06:14:51,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3845093.3333333335, ans=0.125 2023-11-29 06:14:55,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.877e+01 9.031e+01 9.516e+01 1.044e+02 1.307e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 06:14:59,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3845093.3333333335, ans=0.1 2023-11-29 06:15:03,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3845093.3333333335, ans=0.1 2023-11-29 06:15:05,666 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11650, loss[loss=0.06485, simple_loss=0.09111, pruned_loss=0.01265, audio_tagging_loss=0.00665, over 16204.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08976, pruned_loss=0.0119, audio_tagging_loss=0.008471, over 3047708.33 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:15:19,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3845226.6666666665, ans=0.07 2023-11-29 06:15:25,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845226.6666666665, ans=0.1 2023-11-29 06:15:27,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3845226.6666666665, ans=0.125 2023-11-29 06:15:35,345 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-29 06:15:44,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3845360.0, ans=0.125 2023-11-29 06:15:55,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3845426.6666666665, ans=0.05 2023-11-29 06:16:07,149 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11700, loss[loss=0.04022, simple_loss=0.05545, pruned_loss=0.004855, audio_tagging_loss=0.007635, over 15950.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08847, pruned_loss=0.01156, audio_tagging_loss=0.008567, over 3043687.15 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:16:22,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3845560.0, ans=0.125 2023-11-29 06:16:37,039 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-29 06:16:58,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.889e+01 9.558e+01 1.009e+02 1.379e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 06:17:01,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-29 06:17:09,134 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11750, loss[loss=0.06333, simple_loss=0.08473, pruned_loss=0.01184, audio_tagging_loss=0.009119, over 15468.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.0883, pruned_loss=0.01155, audio_tagging_loss=0.008676, over 3051819.97 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:17:09,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=12.0 2023-11-29 06:17:32,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3845960.0, ans=0.2 2023-11-29 06:17:38,164 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-29 06:17:49,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3846026.6666666665, ans=0.125 2023-11-29 06:17:51,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3846026.6666666665, ans=0.0 2023-11-29 06:17:53,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3846026.6666666665, ans=0.2 2023-11-29 06:18:09,230 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:18:10,111 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11800, loss[loss=0.06415, simple_loss=0.08566, pruned_loss=0.01191, audio_tagging_loss=0.00942, over 15649.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08764, pruned_loss=0.01146, audio_tagging_loss=0.008759, over 3045655.87 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:18:27,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-29 06:18:35,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2023-11-29 06:18:38,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-29 06:18:52,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3846360.0, ans=0.125 2023-11-29 06:18:52,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3846360.0, ans=0.0 2023-11-29 06:18:57,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-29 06:19:01,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 9.085e+01 9.909e+01 1.081e+02 1.450e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 06:19:03,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-29 06:19:10,572 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11850, loss[loss=0.07726, simple_loss=0.1086, pruned_loss=0.01552, audio_tagging_loss=0.007433, over 15159.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08773, pruned_loss=0.01149, audio_tagging_loss=0.008829, over 3050445.21 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:19:23,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3846560.0, ans=0.125 2023-11-29 06:19:38,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3846626.6666666665, ans=0.125 2023-11-29 06:19:40,300 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-29 06:20:00,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-29 06:20:11,086 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11900, loss[loss=0.05988, simple_loss=0.07961, pruned_loss=0.01069, audio_tagging_loss=0.009379, over 16468.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08843, pruned_loss=0.01162, audio_tagging_loss=0.008852, over 3055112.02 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:20:13,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3846826.6666666665, ans=0.125 2023-11-29 06:20:18,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3846826.6666666665, ans=0.0 2023-11-29 06:20:30,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3846893.3333333335, ans=0.125 2023-11-29 06:20:36,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2023-11-29 06:20:41,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-29 06:20:43,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-29 06:20:43,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2023-11-29 06:20:54,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-11-29 06:20:59,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3847093.3333333335, ans=0.1 2023-11-29 06:21:02,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.006e+01 9.638e+01 1.018e+02 1.407e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 06:21:07,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3847093.3333333335, ans=0.125 2023-11-29 06:21:10,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847093.3333333335, ans=0.1 2023-11-29 06:21:13,620 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11950, loss[loss=0.06652, simple_loss=0.08361, pruned_loss=0.01314, audio_tagging_loss=0.01157, over 15407.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08857, pruned_loss=0.01187, audio_tagging_loss=0.008979, over 3055056.18 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:21:16,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3847160.0, ans=0.125 2023-11-29 06:21:18,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=22.5 2023-11-29 06:21:21,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-29 06:21:31,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3847226.6666666665, ans=0.0 2023-11-29 06:21:42,220 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-29 06:21:42,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3847293.3333333335, ans=0.125 2023-11-29 06:22:12,415 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 12000, loss[loss=0.05952, simple_loss=0.07884, pruned_loss=0.008175, audio_tagging_loss=0.01192, over 14242.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08854, pruned_loss=0.01173, audio_tagging_loss=0.008969, over 3053979.39 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:22:12,416 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 06:22:40,963 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1605, 2.9219, 2.8109, 2.7209, 3.3030, 3.3076, 3.1566, 3.6214], device='cuda:3') 2023-11-29 06:22:52,513 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05839, simple_loss=0.05056, pruned_loss=0.005496, audio_tagging_loss=0.02761, over 4681554.00 frames. 2023-11-29 06:22:52,514 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 06:22:52,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3847493.3333333335, ans=0.05 2023-11-29 06:23:04,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3847560.0, ans=0.125 2023-11-29 06:23:08,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3847560.0, ans=0.0 2023-11-29 06:23:12,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3847560.0, ans=0.07 2023-11-29 06:23:14,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3847626.6666666665, ans=0.125 2023-11-29 06:23:44,068 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 0, loss[loss=0.07228, simple_loss=0.08692, pruned_loss=0.01102, audio_tagging_loss=0.0178, over 16317.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.08692, pruned_loss=0.01102, audio_tagging_loss=0.0178, over 16317.00 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:23:44,069 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 06:24:02,469 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2991, 4.8140, 5.1958, 4.5242], device='cuda:3') 2023-11-29 06:24:12,338 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3367, 4.3099, 4.4931, 4.4602], device='cuda:3') 2023-11-29 06:24:16,379 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8242, 4.9879, 5.0948, 4.9326], device='cuda:3') 2023-11-29 06:24:20,369 INFO [train_asr.py:1267] (3/4) Epoch 49, validation: loss=0.05827, simple_loss=0.05045, pruned_loss=0.005376, audio_tagging_loss=0.02767, over 4681554.00 frames. 2023-11-29 06:24:20,370 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 06:24:20,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-29 06:24:42,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-29 06:24:42,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.193e+01 9.994e+01 1.113e+02 1.489e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 06:25:02,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3847853.3333333335, ans=0.2 2023-11-29 06:25:07,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2023-11-29 06:25:22,984 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 50, loss[loss=0.05892, simple_loss=0.06047, pruned_loss=0.01066, audio_tagging_loss=0.01803, over 16321.00 frames. ], tot_loss[loss=0.07242, simple_loss=0.0891, pruned_loss=0.01156, audio_tagging_loss=0.01631, over 691103.02 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:25:23,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-29 06:25:29,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3847986.6666666665, ans=0.0 2023-11-29 06:25:58,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.27 vs. limit=22.5 2023-11-29 06:26:08,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3848186.6666666665, ans=0.125 2023-11-29 06:26:25,055 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 100, loss[loss=0.0544, simple_loss=0.06638, pruned_loss=0.00703, audio_tagging_loss=0.01418, over 14596.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.09035, pruned_loss=0.01221, audio_tagging_loss=0.01563, over 1211588.14 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:26:25,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-29 06:26:28,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3848320.0, ans=15.0 2023-11-29 06:26:35,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-29 06:26:36,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=22.5 2023-11-29 06:26:45,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3848386.6666666665, ans=0.125 2023-11-29 06:26:49,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 9.815e+01 1.050e+02 1.112e+02 1.329e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-29 06:26:51,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.76 vs. limit=12.0 2023-11-29 06:26:52,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-11-29 06:27:27,358 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 150, loss[loss=0.08212, simple_loss=0.1145, pruned_loss=0.01533, audio_tagging_loss=0.009518, over 15637.00 frames. ], tot_loss[loss=0.07001, simple_loss=0.08846, pruned_loss=0.01176, audio_tagging_loss=0.01402, over 1617376.87 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:27:27,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-29 06:27:47,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-29 06:28:14,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3848853.3333333335, ans=0.125 2023-11-29 06:28:17,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3848920.0, ans=0.125 2023-11-29 06:28:31,052 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 200, loss[loss=0.0667, simple_loss=0.0883, pruned_loss=0.01313, audio_tagging_loss=0.00942, over 16015.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.08831, pruned_loss=0.01155, audio_tagging_loss=0.0124, over 1933053.81 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:28:31,129 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-29 06:28:36,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2023-11-29 06:28:42,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-29 06:28:45,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-29 06:28:53,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 9.385e+01 9.861e+01 1.084e+02 1.515e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-29 06:29:11,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3849186.6666666665, ans=0.0 2023-11-29 06:29:14,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-11-29 06:29:26,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3849253.3333333335, ans=0.0 2023-11-29 06:29:31,520 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 250, loss[loss=0.07644, simple_loss=0.1057, pruned_loss=0.01519, audio_tagging_loss=0.008384, over 16244.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.08823, pruned_loss=0.01159, audio_tagging_loss=0.01137, over 2182775.34 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:29:31,605 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-29 06:29:35,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3849320.0, ans=10.0 2023-11-29 06:29:57,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.10 vs. limit=15.0 2023-11-29 06:30:06,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3849453.3333333335, ans=0.125 2023-11-29 06:30:07,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-29 06:30:17,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-11-29 06:30:34,224 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 300, loss[loss=0.0867, simple_loss=0.1174, pruned_loss=0.02065, audio_tagging_loss=0.00736, over 14792.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09019, pruned_loss=0.012, audio_tagging_loss=0.01053, over 2371774.92 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:30:34,328 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-29 06:30:39,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-29 06:30:58,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.309e+01 1.014e+02 1.083e+02 1.326e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-29 06:31:07,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849786.6666666665, ans=0.1 2023-11-29 06:31:09,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3849786.6666666665, ans=0.125 2023-11-29 06:31:22,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3849920.0, ans=0.95 2023-11-29 06:31:30,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3849920.0, ans=0.125 2023-11-29 06:31:37,011 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 350, loss[loss=0.06263, simple_loss=0.0959, pruned_loss=0.007921, audio_tagging_loss=0.006759, over 14795.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08871, pruned_loss=0.01176, audio_tagging_loss=0.01016, over 2525613.38 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:31:37,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-29 06:32:19,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=15.0 2023-11-29 06:32:35,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2023-11-29 06:32:39,078 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 400, loss[loss=0.07301, simple_loss=0.1053, pruned_loss=0.01465, audio_tagging_loss=0.005725, over 14230.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08904, pruned_loss=0.01177, audio_tagging_loss=0.009637, over 2638806.62 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:32:39,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-29 06:32:50,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3850386.6666666665, ans=0.0 2023-11-29 06:33:00,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-29 06:33:01,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-29 06:33:02,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.755e+01 9.458e+01 1.037e+02 1.447e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 06:33:28,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-11-29 06:33:29,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3850586.6666666665, ans=0.1 2023-11-29 06:33:36,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3850586.6666666665, ans=0.05 2023-11-29 06:33:41,944 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 450, loss[loss=0.06726, simple_loss=0.09576, pruned_loss=0.01236, audio_tagging_loss=0.007025, over 16614.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08894, pruned_loss=0.01173, audio_tagging_loss=0.00936, over 2729923.35 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:33:42,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-29 06:33:53,811 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:34:10,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3850786.6666666665, ans=0.125 2023-11-29 06:34:11,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3850786.6666666665, ans=0.125 2023-11-29 06:34:28,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-11-29 06:34:45,317 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 500, loss[loss=0.05977, simple_loss=0.07744, pruned_loss=0.01148, audio_tagging_loss=0.009574, over 14779.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08893, pruned_loss=0.01164, audio_tagging_loss=0.009151, over 2801435.29 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:34:45,422 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-29 06:34:46,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3850986.6666666665, ans=0.0 2023-11-29 06:34:51,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3850986.6666666665, ans=0.125 2023-11-29 06:35:08,394 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:35:09,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.909e+01 9.530e+01 1.043e+02 1.565e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 06:35:19,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3851120.0, ans=0.07 2023-11-29 06:35:24,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3851186.6666666665, ans=0.125 2023-11-29 06:35:24,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3851186.6666666665, ans=0.0 2023-11-29 06:35:45,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3851253.3333333335, ans=0.125 2023-11-29 06:35:47,392 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 550, loss[loss=0.0762, simple_loss=0.1074, pruned_loss=0.01677, audio_tagging_loss=0.0057, over 15399.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08848, pruned_loss=0.01159, audio_tagging_loss=0.009101, over 2856998.39 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:35:47,485 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-29 06:35:54,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3851320.0, ans=0.07 2023-11-29 06:36:03,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3851386.6666666665, ans=0.125 2023-11-29 06:36:07,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-29 06:36:31,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3851520.0, ans=0.125 2023-11-29 06:36:49,870 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 600, loss[loss=0.05973, simple_loss=0.07471, pruned_loss=0.0121, audio_tagging_loss=0.01028, over 15008.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08832, pruned_loss=0.01154, audio_tagging_loss=0.008997, over 2893563.63 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:36:49,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-29 06:37:03,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3851720.0, ans=0.1 2023-11-29 06:37:03,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3851720.0, ans=0.125 2023-11-29 06:37:07,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-29 06:37:09,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3851720.0, ans=0.04949747468305833 2023-11-29 06:37:13,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3851786.6666666665, ans=0.0 2023-11-29 06:37:14,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.849e+01 9.501e+01 1.048e+02 1.415e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 06:37:16,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3851786.6666666665, ans=0.2 2023-11-29 06:37:26,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3851853.3333333335, ans=0.125 2023-11-29 06:37:52,660 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 650, loss[loss=0.06115, simple_loss=0.08572, pruned_loss=0.01002, audio_tagging_loss=0.008273, over 15401.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08887, pruned_loss=0.01161, audio_tagging_loss=0.008879, over 2930201.80 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:37:52,775 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-29 06:37:57,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3851986.6666666665, ans=0.125 2023-11-29 06:38:18,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3852120.0, ans=0.125 2023-11-29 06:38:24,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.00 vs. limit=10.0 2023-11-29 06:38:34,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3852186.6666666665, ans=0.1 2023-11-29 06:38:44,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-29 06:38:55,860 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 700, loss[loss=0.07095, simple_loss=0.09319, pruned_loss=0.014, audio_tagging_loss=0.01035, over 14874.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08939, pruned_loss=0.01197, audio_tagging_loss=0.008866, over 2956243.44 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:38:55,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-29 06:38:57,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3852320.0, ans=0.1 2023-11-29 06:39:01,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3852320.0, ans=0.125 2023-11-29 06:39:20,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.141e+01 9.968e+01 1.043e+02 1.174e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-29 06:39:45,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=3852586.6666666665, ans=12.0 2023-11-29 06:39:53,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=15.0 2023-11-29 06:39:58,579 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 750, loss[loss=0.07607, simple_loss=0.1131, pruned_loss=0.01311, audio_tagging_loss=0.006423, over 15890.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08962, pruned_loss=0.01211, audio_tagging_loss=0.008881, over 2982464.72 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:39:58,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-29 06:40:03,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-29 06:40:04,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3852653.3333333335, ans=0.125 2023-11-29 06:40:12,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3852720.0, ans=0.125 2023-11-29 06:40:23,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3852786.6666666665, ans=0.95 2023-11-29 06:40:27,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-29 06:40:45,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3852853.3333333335, ans=0.2 2023-11-29 06:40:53,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-11-29 06:41:01,463 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 800, loss[loss=0.06368, simple_loss=0.08963, pruned_loss=0.008899, audio_tagging_loss=0.009961, over 15613.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09005, pruned_loss=0.01218, audio_tagging_loss=0.008874, over 2998242.17 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:41:01,588 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-29 06:41:22,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-29 06:41:26,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.191e+01 9.688e+01 1.032e+02 1.219e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 06:41:31,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3853120.0, ans=0.95 2023-11-29 06:41:41,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3853186.6666666665, ans=0.125 2023-11-29 06:42:04,093 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 850, loss[loss=0.05483, simple_loss=0.06921, pruned_loss=0.01052, audio_tagging_loss=0.009704, over 14919.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08855, pruned_loss=0.01171, audio_tagging_loss=0.00893, over 3007971.87 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:42:04,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-29 06:42:26,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-29 06:42:33,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3853453.3333333335, ans=0.1 2023-11-29 06:42:43,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3853520.0, ans=0.125 2023-11-29 06:42:51,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-29 06:42:59,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3853586.6666666665, ans=0.0 2023-11-29 06:43:05,888 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 900, loss[loss=0.06984, simple_loss=0.09439, pruned_loss=0.01314, audio_tagging_loss=0.009506, over 15177.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08911, pruned_loss=0.01178, audio_tagging_loss=0.008967, over 3025191.57 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:43:05,971 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-29 06:43:15,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3853653.3333333335, ans=0.125 2023-11-29 06:43:20,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3853720.0, ans=0.0 2023-11-29 06:43:21,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3853720.0, ans=0.125 2023-11-29 06:43:27,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3853720.0, ans=22.5 2023-11-29 06:43:33,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.400e+01 1.003e+02 1.065e+02 1.240e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 06:43:33,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3853786.6666666665, ans=0.2 2023-11-29 06:43:55,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3853920.0, ans=0.2 2023-11-29 06:44:04,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3853920.0, ans=0.125 2023-11-29 06:44:04,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3853920.0, ans=0.125 2023-11-29 06:44:09,220 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 950, loss[loss=0.0551, simple_loss=0.07667, pruned_loss=0.007807, audio_tagging_loss=0.00896, over 14293.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08802, pruned_loss=0.01168, audio_tagging_loss=0.008941, over 3026431.84 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:44:09,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-29 06:44:12,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-29 06:44:24,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854053.3333333335, ans=0.1 2023-11-29 06:44:36,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3854120.0, ans=0.125 2023-11-29 06:44:40,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3854120.0, ans=0.2 2023-11-29 06:44:53,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3854186.6666666665, ans=0.2 2023-11-29 06:45:02,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3854253.3333333335, ans=0.0 2023-11-29 06:45:11,223 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1000, loss[loss=0.04589, simple_loss=0.06482, pruned_loss=0.006353, audio_tagging_loss=0.007124, over 15217.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08898, pruned_loss=0.01172, audio_tagging_loss=0.008675, over 3035680.02 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:45:11,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-29 06:45:28,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2023-11-29 06:45:29,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3854386.6666666665, ans=0.125 2023-11-29 06:45:37,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.899e+01 9.614e+01 1.019e+02 1.244e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 06:45:39,598 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:45:47,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-11-29 06:46:11,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3854653.3333333335, ans=0.125 2023-11-29 06:46:12,492 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1050, loss[loss=0.0653, simple_loss=0.08738, pruned_loss=0.01287, audio_tagging_loss=0.008738, over 15615.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08902, pruned_loss=0.01173, audio_tagging_loss=0.008473, over 3042863.04 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:46:12,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-29 06:46:19,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3854653.3333333335, ans=0.125 2023-11-29 06:46:30,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3854720.0, ans=0.125 2023-11-29 06:46:37,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3854786.6666666665, ans=0.035 2023-11-29 06:46:45,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3854786.6666666665, ans=0.125 2023-11-29 06:46:47,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3854786.6666666665, ans=0.0 2023-11-29 06:47:15,097 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1100, loss[loss=0.06172, simple_loss=0.09007, pruned_loss=0.009416, audio_tagging_loss=0.007262, over 15820.00 frames. ], tot_loss[loss=0.06397, simple_loss=0.08775, pruned_loss=0.01159, audio_tagging_loss=0.008503, over 3043517.60 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:47:15,196 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-29 06:47:19,565 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:47:22,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3854986.6666666665, ans=0.5 2023-11-29 06:47:37,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3855053.3333333335, ans=0.0 2023-11-29 06:47:40,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.246e+01 9.671e+01 1.044e+02 1.404e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:47:43,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3855120.0, ans=0.125 2023-11-29 06:48:18,222 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1150, loss[loss=0.05967, simple_loss=0.08134, pruned_loss=0.01171, audio_tagging_loss=0.00729, over 14929.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08794, pruned_loss=0.01179, audio_tagging_loss=0.008564, over 3043657.19 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:48:18,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-29 06:48:20,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3855320.0, ans=0.125 2023-11-29 06:48:21,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-29 06:48:26,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3855320.0, ans=0.0 2023-11-29 06:48:32,661 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:49:09,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:11,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:12,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:15,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:19,682 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1200, loss[loss=0.07666, simple_loss=0.1115, pruned_loss=0.01545, audio_tagging_loss=0.005428, over 16493.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08779, pruned_loss=0.01173, audio_tagging_loss=0.008581, over 3036662.30 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:49:19,785 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-29 06:49:29,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3855653.3333333335, ans=0.025 2023-11-29 06:49:35,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3855720.0, ans=0.125 2023-11-29 06:49:37,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2023-11-29 06:49:42,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3855720.0, ans=0.2 2023-11-29 06:49:42,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3855720.0, ans=0.125 2023-11-29 06:49:47,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.935e+01 9.457e+01 1.024e+02 1.157e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-29 06:49:56,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3855853.3333333335, ans=0.125 2023-11-29 06:50:05,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3855853.3333333335, ans=0.02 2023-11-29 06:50:15,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3855920.0, ans=0.2 2023-11-29 06:50:21,569 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1250, loss[loss=0.05077, simple_loss=0.06821, pruned_loss=0.007512, audio_tagging_loss=0.009152, over 14100.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08804, pruned_loss=0.01185, audio_tagging_loss=0.008542, over 3038913.74 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:50:21,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-29 06:50:30,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3855986.6666666665, ans=0.025 2023-11-29 06:50:49,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3856120.0, ans=0.05 2023-11-29 06:50:55,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3856120.0, ans=0.05 2023-11-29 06:51:10,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-29 06:51:19,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-29 06:51:24,788 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1300, loss[loss=0.06497, simple_loss=0.09318, pruned_loss=0.01136, audio_tagging_loss=0.007025, over 15920.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08795, pruned_loss=0.01166, audio_tagging_loss=0.008532, over 3037517.04 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:51:24,891 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-29 06:51:29,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-29 06:51:30,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3856320.0, ans=0.0 2023-11-29 06:51:33,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3856320.0, ans=0.1 2023-11-29 06:51:40,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3856386.6666666665, ans=0.125 2023-11-29 06:51:46,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3856386.6666666665, ans=0.0 2023-11-29 06:51:50,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 8.934e+01 9.381e+01 1.015e+02 1.347e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-29 06:52:15,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3856586.6666666665, ans=0.2 2023-11-29 06:52:20,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3856586.6666666665, ans=0.5 2023-11-29 06:52:23,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3856586.6666666665, ans=0.0 2023-11-29 06:52:25,836 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1350, loss[loss=0.05896, simple_loss=0.07162, pruned_loss=0.01545, audio_tagging_loss=0.007703, over 14064.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.0878, pruned_loss=0.01172, audio_tagging_loss=0.008501, over 3043322.57 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:52:25,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-29 06:52:32,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-29 06:52:43,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3856720.0, ans=0.125 2023-11-29 06:52:50,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-29 06:53:10,882 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:53:20,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3856920.0, ans=0.125 2023-11-29 06:53:23,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3856920.0, ans=0.125 2023-11-29 06:53:26,927 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1400, loss[loss=0.05585, simple_loss=0.07454, pruned_loss=0.01036, audio_tagging_loss=0.008219, over 15606.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08815, pruned_loss=0.01182, audio_tagging_loss=0.008487, over 3044400.22 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:53:27,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-29 06:53:27,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3856986.6666666665, ans=0.025 2023-11-29 06:53:33,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3856986.6666666665, ans=0.0 2023-11-29 06:53:41,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3857053.3333333335, ans=0.125 2023-11-29 06:53:53,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3857120.0, ans=0.2 2023-11-29 06:53:54,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 9.091e+01 9.742e+01 1.050e+02 1.544e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 06:54:06,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3857186.6666666665, ans=0.125 2023-11-29 06:54:18,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-11-29 06:54:29,528 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1450, loss[loss=0.06732, simple_loss=0.0957, pruned_loss=0.01166, audio_tagging_loss=0.007806, over 15978.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08808, pruned_loss=0.01166, audio_tagging_loss=0.008616, over 3047748.73 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:54:29,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-29 06:55:00,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-29 06:55:25,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3857586.6666666665, ans=0.0 2023-11-29 06:55:28,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3857586.6666666665, ans=0.125 2023-11-29 06:55:31,268 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1500, loss[loss=0.06721, simple_loss=0.0925, pruned_loss=0.01055, audio_tagging_loss=0.01041, over 15118.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08882, pruned_loss=0.01183, audio_tagging_loss=0.008624, over 3045636.27 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:55:31,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-29 06:55:38,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3857653.3333333335, ans=0.1 2023-11-29 06:55:43,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3857720.0, ans=0.2 2023-11-29 06:55:57,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.098e+01 9.715e+01 1.024e+02 1.252e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 06:56:12,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3857853.3333333335, ans=0.125 2023-11-29 06:56:23,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=12.0 2023-11-29 06:56:24,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3857920.0, ans=0.1 2023-11-29 06:56:30,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3857920.0, ans=0.1 2023-11-29 06:56:31,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-29 06:56:32,890 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1550, loss[loss=0.06526, simple_loss=0.0908, pruned_loss=0.01062, audio_tagging_loss=0.009241, over 15733.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08842, pruned_loss=0.01187, audio_tagging_loss=0.008795, over 3049818.38 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:56:32,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-29 06:56:44,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2023-11-29 06:56:44,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3858053.3333333335, ans=0.125 2023-11-29 06:56:50,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3858053.3333333335, ans=0.125 2023-11-29 06:56:51,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3858053.3333333335, ans=0.125 2023-11-29 06:57:00,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3858120.0, ans=0.0 2023-11-29 06:57:08,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3858186.6666666665, ans=0.0 2023-11-29 06:57:34,233 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1600, loss[loss=0.06829, simple_loss=0.09891, pruned_loss=0.01031, audio_tagging_loss=0.008529, over 15721.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08889, pruned_loss=0.01182, audio_tagging_loss=0.008789, over 3047396.16 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:57:34,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-29 06:57:53,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3858386.6666666665, ans=0.0 2023-11-29 06:57:57,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3858453.3333333335, ans=0.0 2023-11-29 06:58:00,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 9.073e+01 9.678e+01 1.045e+02 1.590e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 06:58:11,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3858520.0, ans=0.125 2023-11-29 06:58:14,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3858520.0, ans=0.125 2023-11-29 06:58:35,995 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1650, loss[loss=0.07077, simple_loss=0.1019, pruned_loss=0.01199, audio_tagging_loss=0.00785, over 15610.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08829, pruned_loss=0.01167, audio_tagging_loss=0.008821, over 3055664.65 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:58:36,081 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-29 06:58:39,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3858653.3333333335, ans=0.2 2023-11-29 06:59:01,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2023-11-29 06:59:05,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3858786.6666666665, ans=0.04949747468305833 2023-11-29 06:59:14,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-29 06:59:23,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-29 06:59:28,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3858920.0, ans=0.125 2023-11-29 06:59:29,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3858920.0, ans=0.0 2023-11-29 06:59:29,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3858920.0, ans=0.1 2023-11-29 06:59:37,372 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1700, loss[loss=0.04658, simple_loss=0.05828, pruned_loss=0.005966, audio_tagging_loss=0.01148, over 14723.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08783, pruned_loss=0.01168, audio_tagging_loss=0.008893, over 3047150.20 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:59:37,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-29 06:59:45,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3858986.6666666665, ans=0.0 2023-11-29 06:59:46,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3858986.6666666665, ans=0.0 2023-11-29 06:59:56,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3859053.3333333335, ans=0.125 2023-11-29 07:00:05,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3859120.0, ans=0.125 2023-11-29 07:00:07,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.933e+01 9.056e+01 9.736e+01 1.037e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 07:00:08,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3859120.0, ans=0.125 2023-11-29 07:00:39,554 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1750, loss[loss=0.0751, simple_loss=0.1109, pruned_loss=0.01455, audio_tagging_loss=0.005127, over 15408.00 frames. ], tot_loss[loss=0.06385, simple_loss=0.08705, pruned_loss=0.01146, audio_tagging_loss=0.008858, over 3049616.40 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:00:39,638 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-29 07:00:54,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-29 07:01:16,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-29 07:01:20,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3859520.0, ans=0.2 2023-11-29 07:01:20,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2023-11-29 07:01:35,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3859586.6666666665, ans=0.0 2023-11-29 07:01:42,545 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1800, loss[loss=0.05359, simple_loss=0.07831, pruned_loss=0.007407, audio_tagging_loss=0.007025, over 15563.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08754, pruned_loss=0.01152, audio_tagging_loss=0.008655, over 3049609.08 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:01:42,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-29 07:01:48,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3859653.3333333335, ans=0.125 2023-11-29 07:01:56,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3859720.0, ans=0.0 2023-11-29 07:02:09,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3859786.6666666665, ans=0.0 2023-11-29 07:02:11,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.159e+01 9.748e+01 1.040e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 07:02:13,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-29 07:02:19,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3859853.3333333335, ans=0.1 2023-11-29 07:02:43,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3859986.6666666665, ans=0.0 2023-11-29 07:02:43,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-29 07:02:44,383 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1850, loss[loss=0.06054, simple_loss=0.07624, pruned_loss=0.01142, audio_tagging_loss=0.01101, over 14459.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08829, pruned_loss=0.0117, audio_tagging_loss=0.008537, over 3051911.10 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:02:44,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-29 07:02:49,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-29 07:02:52,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3859986.6666666665, ans=0.1 2023-11-29 07:02:58,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-29 07:03:30,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3860186.6666666665, ans=10.0 2023-11-29 07:03:40,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2023-11-29 07:03:46,136 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1900, loss[loss=0.05606, simple_loss=0.08426, pruned_loss=0.007573, audio_tagging_loss=0.006354, over 14508.00 frames. ], tot_loss[loss=0.06349, simple_loss=0.08694, pruned_loss=0.01152, audio_tagging_loss=0.008492, over 3048458.05 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:03:46,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-29 07:03:57,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-29 07:04:07,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3860386.6666666665, ans=0.125 2023-11-29 07:04:08,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3860386.6666666665, ans=0.125 2023-11-29 07:04:08,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-29 07:04:14,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.930e+01 9.376e+01 1.025e+02 1.828e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-29 07:04:22,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3860520.0, ans=0.0 2023-11-29 07:04:43,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3860586.6666666665, ans=0.05 2023-11-29 07:04:45,410 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:04:47,572 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1950, loss[loss=0.06161, simple_loss=0.09, pruned_loss=0.007707, audio_tagging_loss=0.008904, over 15853.00 frames. ], tot_loss[loss=0.06326, simple_loss=0.08649, pruned_loss=0.01147, audio_tagging_loss=0.008551, over 3049009.90 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:04:47,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-29 07:05:11,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3860786.6666666665, ans=0.2 2023-11-29 07:05:12,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3860786.6666666665, ans=0.2 2023-11-29 07:05:17,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-29 07:05:22,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3860786.6666666665, ans=0.1 2023-11-29 07:05:31,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-29 07:05:48,967 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2000, loss[loss=0.05692, simple_loss=0.07896, pruned_loss=0.007725, audio_tagging_loss=0.009716, over 13989.00 frames. ], tot_loss[loss=0.06359, simple_loss=0.08686, pruned_loss=0.01154, audio_tagging_loss=0.008623, over 3043701.71 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:05:49,044 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-29 07:05:59,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2023-11-29 07:06:08,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3861053.3333333335, ans=0.0 2023-11-29 07:06:16,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.275e+01 1.004e+02 1.066e+02 1.335e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 07:06:36,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3861253.3333333335, ans=0.1 2023-11-29 07:06:47,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3861253.3333333335, ans=0.125 2023-11-29 07:06:50,230 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2050, loss[loss=0.07453, simple_loss=0.1002, pruned_loss=0.01678, audio_tagging_loss=0.007662, over 15310.00 frames. ], tot_loss[loss=0.06389, simple_loss=0.08766, pruned_loss=0.01155, audio_tagging_loss=0.008507, over 3050282.81 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:06:50,300 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-29 07:06:53,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3861320.0, ans=0.07 2023-11-29 07:07:18,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3861453.3333333335, ans=0.1 2023-11-29 07:07:19,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-29 07:07:23,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3861453.3333333335, ans=0.2 2023-11-29 07:07:26,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-11-29 07:07:27,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3861520.0, ans=0.05 2023-11-29 07:07:36,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2023-11-29 07:07:44,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3861586.6666666665, ans=0.125 2023-11-29 07:07:51,830 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2100, loss[loss=0.08753, simple_loss=0.1268, pruned_loss=0.01687, audio_tagging_loss=0.007248, over 15963.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08818, pruned_loss=0.01144, audio_tagging_loss=0.00849, over 3053269.52 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:07:51,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-29 07:08:18,435 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:08:20,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.068e+01 9.532e+01 1.017e+02 1.251e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 07:08:52,564 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2150, loss[loss=0.0701, simple_loss=0.09943, pruned_loss=0.01309, audio_tagging_loss=0.007286, over 15356.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08847, pruned_loss=0.01168, audio_tagging_loss=0.008488, over 3048726.19 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:08:52,646 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-29 07:08:54,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-29 07:08:56,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3861986.6666666665, ans=0.0 2023-11-29 07:09:01,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3861986.6666666665, ans=0.125 2023-11-29 07:09:04,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3862053.3333333335, ans=0.125 2023-11-29 07:09:15,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3862053.3333333335, ans=0.09899494936611666 2023-11-29 07:09:31,189 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:09:39,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3862186.6666666665, ans=0.125 2023-11-29 07:09:45,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3862253.3333333335, ans=0.125 2023-11-29 07:09:55,033 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2200, loss[loss=0.0685, simple_loss=0.1012, pruned_loss=0.01066, audio_tagging_loss=0.00726, over 15003.00 frames. ], tot_loss[loss=0.06389, simple_loss=0.08767, pruned_loss=0.01153, audio_tagging_loss=0.008526, over 3041522.43 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:09:55,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579350 2023-11-29 07:10:05,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3862320.0, ans=0.09899494936611666 2023-11-29 07:10:22,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.191e+01 9.631e+01 1.029e+02 1.249e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 07:10:28,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3862453.3333333335, ans=0.125 2023-11-29 07:10:32,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3862520.0, ans=0.125 2023-11-29 07:10:39,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3862520.0, ans=0.125 2023-11-29 07:10:45,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3862586.6666666665, ans=0.0 2023-11-29 07:10:50,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3862586.6666666665, ans=0.035 2023-11-29 07:10:51,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3862586.6666666665, ans=0.0 2023-11-29 07:10:55,455 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2250, loss[loss=0.08177, simple_loss=0.1275, pruned_loss=0.01333, audio_tagging_loss=0.004697, over 16048.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08906, pruned_loss=0.01165, audio_tagging_loss=0.008524, over 3040985.16 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:10:55,536 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579400 2023-11-29 07:10:56,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3862653.3333333335, ans=0.2 2023-11-29 07:11:10,422 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:11:12,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-11-29 07:11:17,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3862720.0, ans=0.1 2023-11-29 07:11:29,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-29 07:11:53,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2023-11-29 07:11:56,109 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2300, loss[loss=0.06415, simple_loss=0.08945, pruned_loss=0.01124, audio_tagging_loss=0.008188, over 14166.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08957, pruned_loss=0.01183, audio_tagging_loss=0.008531, over 3043624.07 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:11:56,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579450 2023-11-29 07:12:04,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3862986.6666666665, ans=0.04949747468305833 2023-11-29 07:12:26,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.023e+01 9.872e+01 1.066e+02 2.413e+02, threshold=1.974e+02, percent-clipped=1.0 2023-11-29 07:12:29,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3863120.0, ans=0.2 2023-11-29 07:12:37,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3863186.6666666665, ans=0.09899494936611666 2023-11-29 07:12:51,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3863253.3333333335, ans=0.0 2023-11-29 07:12:52,598 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:12:59,102 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2350, loss[loss=0.05724, simple_loss=0.07682, pruned_loss=0.008744, audio_tagging_loss=0.01009, over 15686.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08851, pruned_loss=0.01162, audio_tagging_loss=0.00865, over 3045194.80 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:12:59,184 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579500 2023-11-29 07:13:22,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3863453.3333333335, ans=0.125 2023-11-29 07:13:35,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3863520.0, ans=0.125 2023-11-29 07:13:36,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3863520.0, ans=0.0 2023-11-29 07:13:54,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-29 07:14:00,820 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2400, loss[loss=0.09727, simple_loss=0.1179, pruned_loss=0.02739, audio_tagging_loss=0.01093, over 13975.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08954, pruned_loss=0.01181, audio_tagging_loss=0.008717, over 3045748.51 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:14:00,907 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579550 2023-11-29 07:14:03,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-11-29 07:14:11,543 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:14:27,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.61 vs. limit=10.0 2023-11-29 07:14:29,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 9.216e+01 9.806e+01 1.047e+02 1.244e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 07:14:31,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2023-11-29 07:14:31,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3863786.6666666665, ans=0.0 2023-11-29 07:14:46,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3863853.3333333335, ans=0.09899494936611666 2023-11-29 07:14:57,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3863920.0, ans=0.125 2023-11-29 07:15:00,885 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2450, loss[loss=0.07544, simple_loss=0.1091, pruned_loss=0.01381, audio_tagging_loss=0.007059, over 14879.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.0894, pruned_loss=0.01168, audio_tagging_loss=0.008776, over 3046891.83 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:15:00,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579600 2023-11-29 07:15:04,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-29 07:15:22,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3864053.3333333335, ans=0.125 2023-11-29 07:15:35,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-11-29 07:16:02,320 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2500, loss[loss=0.04119, simple_loss=0.05533, pruned_loss=0.005118, audio_tagging_loss=0.008405, over 15791.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08881, pruned_loss=0.01174, audio_tagging_loss=0.008799, over 3041805.34 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:16:02,422 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579650 2023-11-29 07:16:05,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3864320.0, ans=0.125 2023-11-29 07:16:21,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3864386.6666666665, ans=0.125 2023-11-29 07:16:31,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.100e+01 9.554e+01 1.019e+02 1.302e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 07:16:38,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3864520.0, ans=0.125 2023-11-29 07:16:58,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-29 07:17:00,740 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:17:04,439 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2550, loss[loss=0.05755, simple_loss=0.07268, pruned_loss=0.009691, audio_tagging_loss=0.01152, over 14347.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08866, pruned_loss=0.0118, audio_tagging_loss=0.008638, over 3039517.10 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:17:05,184 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579700 2023-11-29 07:17:06,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2023-11-29 07:17:37,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3864786.6666666665, ans=0.125 2023-11-29 07:17:41,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3864853.3333333335, ans=0.125 2023-11-29 07:17:54,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3864920.0, ans=0.2 2023-11-29 07:17:57,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3864920.0, ans=0.125 2023-11-29 07:17:58,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.51 vs. limit=22.5 2023-11-29 07:18:00,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3864920.0, ans=0.125 2023-11-29 07:18:05,612 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2600, loss[loss=0.06766, simple_loss=0.09876, pruned_loss=0.01151, audio_tagging_loss=0.006772, over 14887.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08831, pruned_loss=0.01175, audio_tagging_loss=0.008486, over 3046156.48 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:18:05,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579750 2023-11-29 07:18:36,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.765e+01 9.414e+01 9.856e+01 2.856e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-29 07:18:37,140 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:18:40,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3865120.0, ans=0.0 2023-11-29 07:18:41,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3865186.6666666665, ans=0.0 2023-11-29 07:18:46,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3865186.6666666665, ans=0.125 2023-11-29 07:19:00,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3865253.3333333335, ans=0.125 2023-11-29 07:19:01,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3865253.3333333335, ans=0.1 2023-11-29 07:19:05,826 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2650, loss[loss=0.08883, simple_loss=0.1216, pruned_loss=0.02117, audio_tagging_loss=0.006839, over 15639.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08877, pruned_loss=0.01184, audio_tagging_loss=0.008394, over 3045870.35 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:19:05,910 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579800 2023-11-29 07:19:07,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3865320.0, ans=0.125 2023-11-29 07:19:10,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3865320.0, ans=0.125 2023-11-29 07:19:25,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3865386.6666666665, ans=0.02 2023-11-29 07:19:30,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-11-29 07:19:31,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3865453.3333333335, ans=0.0 2023-11-29 07:19:41,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3865453.3333333335, ans=0.1 2023-11-29 07:19:46,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3865520.0, ans=0.125 2023-11-29 07:19:51,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3865520.0, ans=0.1 2023-11-29 07:19:51,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3865520.0, ans=0.1 2023-11-29 07:20:01,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3865586.6666666665, ans=0.125 2023-11-29 07:20:06,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3865653.3333333335, ans=0.0 2023-11-29 07:20:06,919 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2700, loss[loss=0.05505, simple_loss=0.06234, pruned_loss=0.009465, audio_tagging_loss=0.01441, over 14791.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08843, pruned_loss=0.01173, audio_tagging_loss=0.008409, over 3051134.44 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:20:07,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579850 2023-11-29 07:20:17,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2023-11-29 07:20:36,007 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:20:36,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 9.168e+01 9.728e+01 1.035e+02 1.379e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 07:21:07,818 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2750, loss[loss=0.07851, simple_loss=0.1054, pruned_loss=0.01745, audio_tagging_loss=0.008337, over 15868.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08789, pruned_loss=0.0116, audio_tagging_loss=0.008403, over 3050758.92 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:21:07,899 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579900 2023-11-29 07:21:09,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3865986.6666666665, ans=0.0 2023-11-29 07:21:11,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3865986.6666666665, ans=0.0 2023-11-29 07:21:21,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3866053.3333333335, ans=0.0 2023-11-29 07:21:32,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3866120.0, ans=0.0 2023-11-29 07:21:40,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3866120.0, ans=0.0 2023-11-29 07:21:59,934 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:22:08,097 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2800, loss[loss=0.0841, simple_loss=0.1148, pruned_loss=0.01983, audio_tagging_loss=0.006866, over 14962.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08813, pruned_loss=0.01179, audio_tagging_loss=0.008323, over 3049064.68 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:22:08,177 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579950 2023-11-29 07:22:32,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-11-29 07:22:33,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3866453.3333333335, ans=0.2 2023-11-29 07:22:37,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3866453.3333333335, ans=0.125 2023-11-29 07:22:38,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-29 07:22:39,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.979e+01 9.442e+01 1.009e+02 1.188e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 07:22:46,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3866520.0, ans=0.0 2023-11-29 07:22:52,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3866520.0, ans=0.0 2023-11-29 07:22:52,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-29 07:23:09,364 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2850, loss[loss=0.03872, simple_loss=0.04638, pruned_loss=0.003934, audio_tagging_loss=0.0116, over 15285.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08819, pruned_loss=0.01174, audio_tagging_loss=0.008337, over 3054489.75 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:23:09,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580000 2023-11-29 07:23:10,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3866653.3333333335, ans=0.0 2023-11-29 07:23:35,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.09 vs. limit=10.0 2023-11-29 07:23:57,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3866853.3333333335, ans=0.2 2023-11-29 07:24:13,537 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2900, loss[loss=0.07583, simple_loss=0.1053, pruned_loss=0.01593, audio_tagging_loss=0.007264, over 15947.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08871, pruned_loss=0.01197, audio_tagging_loss=0.008365, over 3057612.14 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:24:13,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580050 2023-11-29 07:24:40,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3867120.0, ans=0.125 2023-11-29 07:24:44,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.980e+01 9.788e+01 1.062e+02 1.550e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 07:24:47,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3867120.0, ans=0.0 2023-11-29 07:25:01,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2023-11-29 07:25:07,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3867253.3333333335, ans=0.2 2023-11-29 07:25:10,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3867253.3333333335, ans=0.0 2023-11-29 07:25:14,047 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2950, loss[loss=0.06501, simple_loss=0.08876, pruned_loss=0.01342, audio_tagging_loss=0.007213, over 15014.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08856, pruned_loss=0.01189, audio_tagging_loss=0.00846, over 3059190.36 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:25:14,125 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580100 2023-11-29 07:26:15,557 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3000, loss[loss=0.04003, simple_loss=0.05053, pruned_loss=0.003793, audio_tagging_loss=0.01097, over 14416.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08839, pruned_loss=0.01196, audio_tagging_loss=0.008472, over 3054080.77 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:26:15,558 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 07:26:45,969 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0053, 5.8559, 5.6325, 5.5489], device='cuda:3') 2023-11-29 07:26:49,212 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0120, 5.8916, 5.6552, 5.5918], device='cuda:3') 2023-11-29 07:26:54,597 INFO [train_asr.py:1267] (3/4) Epoch 49, validation: loss=0.05747, simple_loss=0.05054, pruned_loss=0.005474, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-29 07:26:54,597 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 07:26:54,685 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580150 2023-11-29 07:27:03,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3867653.3333333335, ans=0.125 2023-11-29 07:27:14,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3867720.0, ans=0.2 2023-11-29 07:27:15,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-29 07:27:16,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-29 07:27:26,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 9.023e+01 9.601e+01 1.027e+02 1.356e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 07:27:26,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3867786.6666666665, ans=0.125 2023-11-29 07:27:55,397 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3050, loss[loss=0.07813, simple_loss=0.121, pruned_loss=0.01191, audio_tagging_loss=0.005736, over 15058.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08907, pruned_loss=0.01191, audio_tagging_loss=0.008478, over 3051288.50 frames. ], batch size: 51, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:27:55,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580200 2023-11-29 07:28:04,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3867986.6666666665, ans=0.125 2023-11-29 07:28:10,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-29 07:28:12,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=22.5 2023-11-29 07:28:15,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3868053.3333333335, ans=0.125 2023-11-29 07:28:21,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3868120.0, ans=0.125 2023-11-29 07:28:32,372 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:28:45,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3868253.3333333335, ans=0.125 2023-11-29 07:28:57,691 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3100, loss[loss=0.05879, simple_loss=0.08335, pruned_loss=0.007093, audio_tagging_loss=0.01002, over 15234.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08888, pruned_loss=0.01176, audio_tagging_loss=0.008525, over 3044096.22 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:28:57,782 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580250 2023-11-29 07:29:08,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2023-11-29 07:29:09,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3868386.6666666665, ans=0.125 2023-11-29 07:29:21,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3868453.3333333335, ans=0.125 2023-11-29 07:29:29,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.858e+01 9.570e+01 1.021e+02 1.337e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 07:29:35,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3868520.0, ans=0.1 2023-11-29 07:29:42,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3868520.0, ans=0.1 2023-11-29 07:29:43,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=22.5 2023-11-29 07:29:48,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-29 07:29:59,539 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3150, loss[loss=0.06894, simple_loss=0.1033, pruned_loss=0.008709, audio_tagging_loss=0.008574, over 15818.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08969, pruned_loss=0.01193, audio_tagging_loss=0.008564, over 3042053.53 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:29:59,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580300 2023-11-29 07:30:07,540 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:30:15,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-29 07:30:15,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3868720.0, ans=0.125 2023-11-29 07:30:26,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3868786.6666666665, ans=0.125 2023-11-29 07:30:33,486 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:30:38,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=12.0 2023-11-29 07:30:43,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3868853.3333333335, ans=0.0 2023-11-29 07:30:44,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3868853.3333333335, ans=0.125 2023-11-29 07:30:46,843 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:31:01,064 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3200, loss[loss=0.05469, simple_loss=0.06882, pruned_loss=0.007433, audio_tagging_loss=0.01285, over 15035.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08904, pruned_loss=0.01175, audio_tagging_loss=0.00873, over 3045921.41 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:31:01,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580350 2023-11-29 07:31:04,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3868986.6666666665, ans=0.0 2023-11-29 07:31:12,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3869053.3333333335, ans=0.1 2023-11-29 07:31:33,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 8.935e+01 9.459e+01 1.020e+02 1.289e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:31:47,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3869186.6666666665, ans=0.0 2023-11-29 07:31:55,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3869253.3333333335, ans=0.1 2023-11-29 07:32:02,193 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3250, loss[loss=0.07183, simple_loss=0.1013, pruned_loss=0.01199, audio_tagging_loss=0.009218, over 15046.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08849, pruned_loss=0.01151, audio_tagging_loss=0.00884, over 3049820.03 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:32:02,271 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580400 2023-11-29 07:32:19,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2023-11-29 07:33:04,507 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3300, loss[loss=0.07158, simple_loss=0.09973, pruned_loss=0.01497, audio_tagging_loss=0.006748, over 14315.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08878, pruned_loss=0.01162, audio_tagging_loss=0.008951, over 3050328.53 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:33:04,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580450 2023-11-29 07:33:07,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3869653.3333333335, ans=0.0 2023-11-29 07:33:24,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3869720.0, ans=10.0 2023-11-29 07:33:37,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.902e+01 9.466e+01 1.005e+02 1.164e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 07:33:41,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-29 07:34:06,873 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3350, loss[loss=0.05301, simple_loss=0.07063, pruned_loss=0.007717, audio_tagging_loss=0.009981, over 16007.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08945, pruned_loss=0.01175, audio_tagging_loss=0.008787, over 3054324.83 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:34:06,963 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580500 2023-11-29 07:34:24,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=22.5 2023-11-29 07:35:08,973 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3400, loss[loss=0.07831, simple_loss=0.1064, pruned_loss=0.01798, audio_tagging_loss=0.007111, over 15929.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09045, pruned_loss=0.01194, audio_tagging_loss=0.008572, over 3054238.47 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:35:09,049 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580550 2023-11-29 07:35:09,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2023-11-29 07:35:11,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3870320.0, ans=0.125 2023-11-29 07:35:20,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-29 07:35:31,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3870386.6666666665, ans=0.125 2023-11-29 07:35:31,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3870386.6666666665, ans=0.0 2023-11-29 07:35:31,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3870386.6666666665, ans=0.125 2023-11-29 07:35:41,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-11-29 07:35:41,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.012e+01 9.460e+01 1.056e+02 1.309e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:35:53,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3870520.0, ans=0.125 2023-11-29 07:35:59,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3870586.6666666665, ans=0.0 2023-11-29 07:36:05,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3870586.6666666665, ans=0.0 2023-11-29 07:36:11,813 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3450, loss[loss=0.06483, simple_loss=0.07931, pruned_loss=0.01486, audio_tagging_loss=0.01033, over 13699.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.0888, pruned_loss=0.01161, audio_tagging_loss=0.008555, over 3053275.30 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:36:11,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580600 2023-11-29 07:36:29,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3870720.0, ans=0.1 2023-11-29 07:36:33,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3870720.0, ans=0.0 2023-11-29 07:36:47,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=22.5 2023-11-29 07:36:48,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3870853.3333333335, ans=0.0 2023-11-29 07:36:48,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3870853.3333333335, ans=0.125 2023-11-29 07:37:01,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3870920.0, ans=0.0 2023-11-29 07:37:12,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3870986.6666666665, ans=0.125 2023-11-29 07:37:13,499 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3500, loss[loss=0.07692, simple_loss=0.1113, pruned_loss=0.01337, audio_tagging_loss=0.007903, over 15282.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08968, pruned_loss=0.01167, audio_tagging_loss=0.008458, over 3045915.48 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:37:13,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580650 2023-11-29 07:37:37,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-29 07:37:47,398 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:37:48,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.986e+01 9.811e+01 1.065e+02 1.473e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 07:37:54,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3871186.6666666665, ans=0.125 2023-11-29 07:38:16,943 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3550, loss[loss=0.05181, simple_loss=0.0686, pruned_loss=0.009271, audio_tagging_loss=0.008241, over 15446.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08938, pruned_loss=0.01172, audio_tagging_loss=0.008448, over 3050456.16 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:38:17,026 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580700 2023-11-29 07:38:25,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3871320.0, ans=0.0 2023-11-29 07:38:26,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3871320.0, ans=0.2 2023-11-29 07:38:34,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-29 07:38:47,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3871453.3333333335, ans=0.2 2023-11-29 07:38:49,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3871453.3333333335, ans=0.1 2023-11-29 07:38:53,334 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:39:00,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3871520.0, ans=0.125 2023-11-29 07:39:16,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-29 07:39:18,696 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3600, loss[loss=0.07873, simple_loss=0.1148, pruned_loss=0.01451, audio_tagging_loss=0.006839, over 15959.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08947, pruned_loss=0.01179, audio_tagging_loss=0.008426, over 3051150.82 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:39:18,797 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580750 2023-11-29 07:39:47,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3871786.6666666665, ans=0.0 2023-11-29 07:39:51,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.727e+01 9.343e+01 1.017e+02 1.458e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-29 07:39:52,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3871786.6666666665, ans=0.125 2023-11-29 07:40:19,951 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3650, loss[loss=0.05058, simple_loss=0.06544, pruned_loss=0.005685, audio_tagging_loss=0.01218, over 15558.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08984, pruned_loss=0.01183, audio_tagging_loss=0.008322, over 3052927.69 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:40:20,027 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580800 2023-11-29 07:40:21,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2023-11-29 07:40:25,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3871986.6666666665, ans=0.1 2023-11-29 07:40:38,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3872053.3333333335, ans=0.125 2023-11-29 07:40:55,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3872120.0, ans=0.0 2023-11-29 07:41:21,612 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3700, loss[loss=0.05837, simple_loss=0.08068, pruned_loss=0.008917, audio_tagging_loss=0.009109, over 14617.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09063, pruned_loss=0.01219, audio_tagging_loss=0.008302, over 3053849.79 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:41:21,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580850 2023-11-29 07:41:26,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3872320.0, ans=0.2 2023-11-29 07:41:30,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3872320.0, ans=0.95 2023-11-29 07:41:54,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3872453.3333333335, ans=0.125 2023-11-29 07:41:56,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.236e+01 9.960e+01 1.067e+02 1.392e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-29 07:42:12,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3872586.6666666665, ans=0.1 2023-11-29 07:42:24,368 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3750, loss[loss=0.06377, simple_loss=0.09195, pruned_loss=0.01312, audio_tagging_loss=0.004669, over 14329.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0907, pruned_loss=0.01215, audio_tagging_loss=0.008395, over 3054610.02 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:42:24,481 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580900 2023-11-29 07:42:38,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3872720.0, ans=0.125 2023-11-29 07:42:46,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3872720.0, ans=0.04949747468305833 2023-11-29 07:42:47,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-29 07:42:52,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3872786.6666666665, ans=0.0 2023-11-29 07:43:09,379 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:43:19,669 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:43:26,313 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3800, loss[loss=0.06494, simple_loss=0.0899, pruned_loss=0.0138, audio_tagging_loss=0.006192, over 14814.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09017, pruned_loss=0.01194, audio_tagging_loss=0.008465, over 3054121.79 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:43:26,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 580950 2023-11-29 07:44:00,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3873120.0, ans=0.125 2023-11-29 07:44:01,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.972e+01 9.513e+01 1.036e+02 1.364e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 07:44:02,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-29 07:44:16,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-29 07:44:23,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3873253.3333333335, ans=0.0 2023-11-29 07:44:28,027 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3850, loss[loss=0.04706, simple_loss=0.06499, pruned_loss=0.005381, audio_tagging_loss=0.009186, over 14668.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08923, pruned_loss=0.01168, audio_tagging_loss=0.008484, over 3050371.51 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:44:28,102 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581000 2023-11-29 07:44:51,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3873386.6666666665, ans=0.125 2023-11-29 07:45:03,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3873453.3333333335, ans=0.125 2023-11-29 07:45:14,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3873520.0, ans=0.0 2023-11-29 07:45:30,913 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3900, loss[loss=0.0707, simple_loss=0.08559, pruned_loss=0.01804, audio_tagging_loss=0.009868, over 13197.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08937, pruned_loss=0.01164, audio_tagging_loss=0.008544, over 3049092.93 frames. ], batch size: 50, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:45:31,003 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581050 2023-11-29 07:45:51,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3873720.0, ans=0.2 2023-11-29 07:46:04,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.794e+01 9.412e+01 1.023e+02 1.323e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 07:46:31,865 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 3950, loss[loss=0.06046, simple_loss=0.08235, pruned_loss=0.01085, audio_tagging_loss=0.00843, over 14312.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08933, pruned_loss=0.01167, audio_tagging_loss=0.008615, over 3054108.58 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:46:31,956 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581100 2023-11-29 07:46:52,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-29 07:46:56,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3874120.0, ans=0.1 2023-11-29 07:47:06,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3874120.0, ans=0.0 2023-11-29 07:47:24,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3874253.3333333335, ans=0.125 2023-11-29 07:47:31,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3874320.0, ans=0.2 2023-11-29 07:47:32,135 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4000, loss[loss=0.08153, simple_loss=0.1084, pruned_loss=0.01617, audio_tagging_loss=0.01114, over 14215.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08872, pruned_loss=0.01156, audio_tagging_loss=0.008706, over 3052911.65 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:47:32,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581150 2023-11-29 07:47:40,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-29 07:47:44,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3874386.6666666665, ans=0.1 2023-11-29 07:47:46,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874386.6666666665, ans=0.1 2023-11-29 07:48:01,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3874453.3333333335, ans=0.0 2023-11-29 07:48:08,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 9.122e+01 9.589e+01 1.060e+02 2.170e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-29 07:48:14,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3874520.0, ans=0.0 2023-11-29 07:48:20,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3874586.6666666665, ans=0.09899494936611666 2023-11-29 07:48:25,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874586.6666666665, ans=0.1 2023-11-29 07:48:25,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-29 07:48:27,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3874586.6666666665, ans=0.0 2023-11-29 07:48:33,317 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4050, loss[loss=0.06382, simple_loss=0.08851, pruned_loss=0.01043, audio_tagging_loss=0.009141, over 15390.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08904, pruned_loss=0.01176, audio_tagging_loss=0.008705, over 3052076.25 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:48:33,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581200 2023-11-29 07:48:37,477 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:48:37,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3874653.3333333335, ans=0.125 2023-11-29 07:48:43,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3874653.3333333335, ans=0.0 2023-11-29 07:48:48,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3874720.0, ans=0.04949747468305833 2023-11-29 07:49:01,850 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:49:13,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-11-29 07:49:31,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3874920.0, ans=0.2 2023-11-29 07:49:35,697 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4100, loss[loss=0.06534, simple_loss=0.08462, pruned_loss=0.0139, audio_tagging_loss=0.009132, over 14606.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08931, pruned_loss=0.01184, audio_tagging_loss=0.008744, over 3053442.96 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:49:35,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581250 2023-11-29 07:49:59,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-11-29 07:50:06,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3875120.0, ans=0.1 2023-11-29 07:50:11,083 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.112e+01 9.700e+01 1.031e+02 1.226e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 07:50:35,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3875320.0, ans=0.025 2023-11-29 07:50:36,470 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4150, loss[loss=0.04577, simple_loss=0.06005, pruned_loss=0.008121, audio_tagging_loss=0.007621, over 14369.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08962, pruned_loss=0.01184, audio_tagging_loss=0.008655, over 3041545.56 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:50:36,571 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581300 2023-11-29 07:50:42,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3875320.0, ans=0.125 2023-11-29 07:50:43,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3875320.0, ans=0.2 2023-11-29 07:50:47,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875386.6666666665, ans=0.1 2023-11-29 07:51:04,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-29 07:51:10,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3875453.3333333335, ans=0.2 2023-11-29 07:51:11,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3875453.3333333335, ans=0.125 2023-11-29 07:51:22,066 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:51:23,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-29 07:51:36,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3875653.3333333335, ans=0.0 2023-11-29 07:51:37,788 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4200, loss[loss=0.07997, simple_loss=0.1164, pruned_loss=0.01537, audio_tagging_loss=0.006408, over 14965.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08856, pruned_loss=0.01175, audio_tagging_loss=0.008609, over 3041499.20 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:51:37,862 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581350 2023-11-29 07:52:06,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3875786.6666666665, ans=0.125 2023-11-29 07:52:11,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3875786.6666666665, ans=0.125 2023-11-29 07:52:13,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.081e+01 9.650e+01 1.017e+02 1.202e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 07:52:22,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3875853.3333333335, ans=0.2 2023-11-29 07:52:39,524 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4250, loss[loss=0.04968, simple_loss=0.07511, pruned_loss=0.004481, audio_tagging_loss=0.007645, over 14977.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08907, pruned_loss=0.01188, audio_tagging_loss=0.008473, over 3044371.93 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:52:39,597 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581400 2023-11-29 07:52:44,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3875986.6666666665, ans=0.125 2023-11-29 07:52:54,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-29 07:53:01,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3876053.3333333335, ans=0.125 2023-11-29 07:53:23,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3876186.6666666665, ans=0.125 2023-11-29 07:53:41,494 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4300, loss[loss=0.04951, simple_loss=0.06375, pruned_loss=0.00796, audio_tagging_loss=0.009674, over 15970.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09092, pruned_loss=0.01212, audio_tagging_loss=0.008286, over 3044291.16 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:53:41,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581450 2023-11-29 07:54:14,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3876453.3333333335, ans=0.125 2023-11-29 07:54:16,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 9.277e+01 9.932e+01 1.054e+02 1.240e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 07:54:33,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3876586.6666666665, ans=0.125 2023-11-29 07:54:42,939 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4350, loss[loss=0.07941, simple_loss=0.09843, pruned_loss=0.0189, audio_tagging_loss=0.01129, over 14421.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09093, pruned_loss=0.01222, audio_tagging_loss=0.008263, over 3045297.25 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:54:43,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581500 2023-11-29 07:54:47,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3876653.3333333335, ans=0.125 2023-11-29 07:54:58,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3876720.0, ans=0.0 2023-11-29 07:55:06,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-11-29 07:55:07,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3876786.6666666665, ans=0.125 2023-11-29 07:55:15,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3876786.6666666665, ans=0.0 2023-11-29 07:55:24,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2023-11-29 07:55:26,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3876853.3333333335, ans=0.0 2023-11-29 07:55:28,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3876853.3333333335, ans=0.0 2023-11-29 07:55:45,005 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4400, loss[loss=0.06737, simple_loss=0.09874, pruned_loss=0.01046, audio_tagging_loss=0.007537, over 15458.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09145, pruned_loss=0.01231, audio_tagging_loss=0.008205, over 3046086.98 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:55:45,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581550 2023-11-29 07:55:59,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3877053.3333333335, ans=0.125 2023-11-29 07:56:15,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.51 vs. limit=10.0 2023-11-29 07:56:21,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 9.242e+01 9.842e+01 1.066e+02 1.310e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 07:56:46,485 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4450, loss[loss=0.07126, simple_loss=0.09379, pruned_loss=0.01627, audio_tagging_loss=0.008101, over 14491.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09054, pruned_loss=0.01227, audio_tagging_loss=0.008233, over 3046704.01 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:56:46,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581600 2023-11-29 07:56:47,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3877320.0, ans=0.0 2023-11-29 07:57:17,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3877453.3333333335, ans=0.125 2023-11-29 07:57:23,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-29 07:57:35,037 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:57:48,375 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4500, loss[loss=0.06444, simple_loss=0.09339, pruned_loss=0.01108, audio_tagging_loss=0.006665, over 15672.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08975, pruned_loss=0.01199, audio_tagging_loss=0.008281, over 3051951.11 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:57:48,474 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581650 2023-11-29 07:57:54,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3877653.3333333335, ans=0.125 2023-11-29 07:58:25,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.167e+01 9.852e+01 1.040e+02 1.276e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 07:58:27,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.35 vs. limit=5.0 2023-11-29 07:58:50,535 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4550, loss[loss=0.07442, simple_loss=0.0941, pruned_loss=0.01661, audio_tagging_loss=0.01076, over 14282.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.0893, pruned_loss=0.01193, audio_tagging_loss=0.00831, over 3056101.68 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:58:50,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581700 2023-11-29 07:58:56,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=12.0 2023-11-29 07:59:19,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3878120.0, ans=0.04949747468305833 2023-11-29 07:59:23,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3878120.0, ans=0.2 2023-11-29 07:59:37,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3878186.6666666665, ans=0.0 2023-11-29 07:59:38,644 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:59:41,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.55 vs. limit=10.0 2023-11-29 07:59:51,480 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4600, loss[loss=0.07364, simple_loss=0.1056, pruned_loss=0.01291, audio_tagging_loss=0.007922, over 15947.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08853, pruned_loss=0.01172, audio_tagging_loss=0.008503, over 3056886.79 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:59:51,576 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581750 2023-11-29 07:59:55,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3878320.0, ans=0.2 2023-11-29 08:00:08,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3878386.6666666665, ans=0.2 2023-11-29 08:00:09,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3878386.6666666665, ans=0.09899494936611666 2023-11-29 08:00:10,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3878386.6666666665, ans=0.0 2023-11-29 08:00:29,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.974e+01 9.623e+01 1.050e+02 1.439e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 08:00:29,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3878520.0, ans=0.0 2023-11-29 08:00:52,987 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4650, loss[loss=0.0666, simple_loss=0.1004, pruned_loss=0.009028, audio_tagging_loss=0.007364, over 14770.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08931, pruned_loss=0.01193, audio_tagging_loss=0.008572, over 3049348.71 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:00:53,051 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581800 2023-11-29 08:01:19,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3878786.6666666665, ans=0.125 2023-11-29 08:01:51,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-29 08:01:56,864 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4700, loss[loss=0.07403, simple_loss=0.1034, pruned_loss=0.01461, audio_tagging_loss=0.007737, over 15267.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08915, pruned_loss=0.01184, audio_tagging_loss=0.008659, over 3048910.83 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:01:56,980 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581850 2023-11-29 08:02:23,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3879120.0, ans=0.125 2023-11-29 08:02:33,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.091e+01 9.646e+01 1.031e+02 1.253e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 08:02:38,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3879186.6666666665, ans=0.07 2023-11-29 08:02:58,725 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4750, loss[loss=0.04353, simple_loss=0.05065, pruned_loss=0.006558, audio_tagging_loss=0.01165, over 15580.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08894, pruned_loss=0.01183, audio_tagging_loss=0.008761, over 3049630.58 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:02:58,814 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581900 2023-11-29 08:02:58,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3879320.0, ans=0.0 2023-11-29 08:03:34,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-29 08:03:34,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3879520.0, ans=0.2 2023-11-29 08:03:57,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3879586.6666666665, ans=0.0 2023-11-29 08:03:59,270 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4800, loss[loss=0.07306, simple_loss=0.0988, pruned_loss=0.01474, audio_tagging_loss=0.008916, over 14819.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08848, pruned_loss=0.01166, audio_tagging_loss=0.008856, over 3045590.92 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:03:59,359 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 581950 2023-11-29 08:04:01,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3879653.3333333335, ans=0.125 2023-11-29 08:04:09,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3879653.3333333335, ans=0.2 2023-11-29 08:04:31,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3879786.6666666665, ans=0.0 2023-11-29 08:04:34,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-29 08:04:36,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 9.178e+01 9.692e+01 1.041e+02 1.280e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:04:47,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3879920.0, ans=0.025 2023-11-29 08:04:55,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3879920.0, ans=0.2 2023-11-29 08:04:58,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3879920.0, ans=0.125 2023-11-29 08:05:01,441 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4850, loss[loss=0.04913, simple_loss=0.06361, pruned_loss=0.006785, audio_tagging_loss=0.01054, over 16662.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08776, pruned_loss=0.01159, audio_tagging_loss=0.009014, over 3045452.00 frames. ], batch size: 65, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:05:02,213 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582000 2023-11-29 08:05:07,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3879986.6666666665, ans=0.0 2023-11-29 08:05:23,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3880053.3333333335, ans=0.125 2023-11-29 08:05:57,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-29 08:06:04,485 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4900, loss[loss=0.05278, simple_loss=0.07093, pruned_loss=0.008086, audio_tagging_loss=0.009225, over 15645.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08744, pruned_loss=0.01158, audio_tagging_loss=0.008984, over 3036274.38 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:06:04,588 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582050 2023-11-29 08:06:11,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2023-11-29 08:06:13,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=15.0 2023-11-29 08:06:13,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3880320.0, ans=0.125 2023-11-29 08:06:43,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.902e+01 9.348e+01 9.931e+01 1.050e+02 1.310e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 08:06:43,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-29 08:06:44,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3880520.0, ans=0.125 2023-11-29 08:07:05,330 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 4950, loss[loss=0.08359, simple_loss=0.1156, pruned_loss=0.01803, audio_tagging_loss=0.007784, over 15170.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08811, pruned_loss=0.01165, audio_tagging_loss=0.008846, over 3032862.14 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:07:05,404 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582100 2023-11-29 08:07:05,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3880653.3333333335, ans=10.0 2023-11-29 08:07:18,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-29 08:07:26,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3880720.0, ans=0.125 2023-11-29 08:07:26,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.08 vs. limit=10.0 2023-11-29 08:07:32,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3880786.6666666665, ans=0.1 2023-11-29 08:08:07,541 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5000, loss[loss=0.07035, simple_loss=0.09334, pruned_loss=0.01299, audio_tagging_loss=0.01069, over 15190.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08853, pruned_loss=0.01171, audio_tagging_loss=0.008665, over 3045966.62 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:08:07,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582150 2023-11-29 08:08:12,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2023-11-29 08:08:19,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3881053.3333333335, ans=0.125 2023-11-29 08:08:24,135 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:08:42,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2023-11-29 08:08:44,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3881186.6666666665, ans=0.035 2023-11-29 08:08:45,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.159e+01 9.676e+01 1.038e+02 1.226e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:08:50,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=12.0 2023-11-29 08:09:06,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2023-11-29 08:09:10,347 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5050, loss[loss=0.07418, simple_loss=0.1082, pruned_loss=0.01247, audio_tagging_loss=0.007614, over 15014.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08874, pruned_loss=0.0117, audio_tagging_loss=0.008604, over 3040356.74 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:09:10,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582200 2023-11-29 08:09:22,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2023-11-29 08:09:26,053 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:09:27,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3881386.6666666665, ans=0.0 2023-11-29 08:09:40,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=22.5 2023-11-29 08:09:42,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3881453.3333333335, ans=0.1 2023-11-29 08:09:43,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3881453.3333333335, ans=0.1 2023-11-29 08:10:11,779 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5100, loss[loss=0.04594, simple_loss=0.0642, pruned_loss=0.007405, audio_tagging_loss=0.006436, over 14878.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08822, pruned_loss=0.01176, audio_tagging_loss=0.00851, over 3039323.35 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:10:11,868 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582250 2023-11-29 08:10:12,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3881653.3333333335, ans=0.0 2023-11-29 08:10:13,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3881653.3333333335, ans=0.125 2023-11-29 08:10:21,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3881653.3333333335, ans=0.125 2023-11-29 08:10:36,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3881786.6666666665, ans=0.0 2023-11-29 08:10:49,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 8.838e+01 9.435e+01 1.031e+02 1.429e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-29 08:11:13,109 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5150, loss[loss=0.06808, simple_loss=0.09379, pruned_loss=0.01405, audio_tagging_loss=0.007137, over 15173.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08838, pruned_loss=0.01173, audio_tagging_loss=0.008439, over 3044835.16 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:11:13,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582300 2023-11-29 08:11:24,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3881986.6666666665, ans=0.125 2023-11-29 08:11:36,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3882053.3333333335, ans=0.125 2023-11-29 08:11:36,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3882053.3333333335, ans=0.125 2023-11-29 08:12:00,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3882186.6666666665, ans=0.1 2023-11-29 08:12:15,502 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5200, loss[loss=0.06472, simple_loss=0.08981, pruned_loss=0.01245, audio_tagging_loss=0.007357, over 15101.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08881, pruned_loss=0.01182, audio_tagging_loss=0.008397, over 3039510.68 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:12:15,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582350 2023-11-29 08:12:16,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3882320.0, ans=0.025 2023-11-29 08:12:38,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-29 08:12:39,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3882453.3333333335, ans=0.125 2023-11-29 08:12:52,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.958e+01 9.640e+01 1.041e+02 1.476e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 08:12:58,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3882520.0, ans=0.125 2023-11-29 08:13:16,411 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5250, loss[loss=0.06735, simple_loss=0.1003, pruned_loss=0.01029, audio_tagging_loss=0.006923, over 15814.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.0886, pruned_loss=0.0118, audio_tagging_loss=0.008438, over 3039830.15 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:13:16,491 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582400 2023-11-29 08:13:20,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3882653.3333333335, ans=0.2 2023-11-29 08:13:22,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3882653.3333333335, ans=0.5 2023-11-29 08:13:45,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3882786.6666666665, ans=0.125 2023-11-29 08:13:54,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3882853.3333333335, ans=0.125 2023-11-29 08:14:02,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3882853.3333333335, ans=0.1 2023-11-29 08:14:18,864 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5300, loss[loss=0.06148, simple_loss=0.08259, pruned_loss=0.01018, audio_tagging_loss=0.01001, over 15321.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08893, pruned_loss=0.01187, audio_tagging_loss=0.008458, over 3036275.39 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:14:18,945 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582450 2023-11-29 08:14:30,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-29 08:14:36,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3883053.3333333335, ans=0.07 2023-11-29 08:14:42,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3883120.0, ans=0.125 2023-11-29 08:14:47,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3883120.0, ans=0.05 2023-11-29 08:14:57,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.053e+01 9.676e+01 1.034e+02 1.415e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:15:20,464 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5350, loss[loss=0.06602, simple_loss=0.09675, pruned_loss=0.01183, audio_tagging_loss=0.005818, over 15762.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08927, pruned_loss=0.01189, audio_tagging_loss=0.008445, over 3035069.15 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:15:20,553 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582500 2023-11-29 08:15:24,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3883320.0, ans=0.0 2023-11-29 08:15:26,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3883320.0, ans=0.95 2023-11-29 08:15:29,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3883320.0, ans=0.125 2023-11-29 08:15:30,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3883320.0, ans=0.0 2023-11-29 08:15:34,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883386.6666666665, ans=0.125 2023-11-29 08:15:35,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3883386.6666666665, ans=0.2 2023-11-29 08:15:40,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-29 08:16:07,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3883520.0, ans=0.1 2023-11-29 08:16:12,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3883586.6666666665, ans=0.0 2023-11-29 08:16:20,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3883653.3333333335, ans=0.015 2023-11-29 08:16:21,970 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5400, loss[loss=0.07036, simple_loss=0.09757, pruned_loss=0.01234, audio_tagging_loss=0.009237, over 14378.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08971, pruned_loss=0.01198, audio_tagging_loss=0.008446, over 3032987.22 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:16:22,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582550 2023-11-29 08:16:30,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3883653.3333333335, ans=0.09899494936611666 2023-11-29 08:16:41,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3883720.0, ans=0.0 2023-11-29 08:16:56,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3883786.6666666665, ans=0.0 2023-11-29 08:17:01,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 9.215e+01 9.741e+01 1.047e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 08:17:02,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3883853.3333333335, ans=0.09899494936611666 2023-11-29 08:17:04,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3883853.3333333335, ans=0.125 2023-11-29 08:17:23,097 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5450, loss[loss=0.06576, simple_loss=0.0874, pruned_loss=0.01369, audio_tagging_loss=0.008378, over 15617.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08994, pruned_loss=0.01209, audio_tagging_loss=0.008482, over 3038340.87 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:17:23,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582600 2023-11-29 08:17:41,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3884053.3333333335, ans=0.0 2023-11-29 08:17:58,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3884120.0, ans=0.2 2023-11-29 08:18:24,667 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5500, loss[loss=0.06271, simple_loss=0.07265, pruned_loss=0.01093, audio_tagging_loss=0.01545, over 15923.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09027, pruned_loss=0.01223, audio_tagging_loss=0.008443, over 3046718.12 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:18:24,740 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582650 2023-11-29 08:18:28,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3884320.0, ans=0.125 2023-11-29 08:18:34,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3884320.0, ans=0.0 2023-11-29 08:18:44,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3884386.6666666665, ans=0.0 2023-11-29 08:19:00,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3884520.0, ans=0.0 2023-11-29 08:19:03,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 9.075e+01 9.683e+01 1.060e+02 1.497e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 08:19:04,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3884520.0, ans=0.0 2023-11-29 08:19:11,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-29 08:19:12,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3884586.6666666665, ans=0.0 2023-11-29 08:19:13,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3884586.6666666665, ans=0.0 2023-11-29 08:19:25,574 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5550, loss[loss=0.0613, simple_loss=0.08341, pruned_loss=0.009682, audio_tagging_loss=0.009919, over 16020.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08996, pruned_loss=0.01201, audio_tagging_loss=0.008563, over 3050490.47 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:19:25,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582700 2023-11-29 08:19:25,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3884653.3333333335, ans=0.0 2023-11-29 08:19:34,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3884653.3333333335, ans=0.125 2023-11-29 08:19:44,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3884720.0, ans=0.125 2023-11-29 08:20:22,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-29 08:20:26,552 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5600, loss[loss=0.05912, simple_loss=0.07372, pruned_loss=0.01139, audio_tagging_loss=0.01088, over 16776.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09019, pruned_loss=0.01194, audio_tagging_loss=0.008612, over 3050719.88 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:20:26,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582750 2023-11-29 08:20:28,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=22.5 2023-11-29 08:20:29,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3884986.6666666665, ans=0.125 2023-11-29 08:20:35,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2023-11-29 08:20:38,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=12.0 2023-11-29 08:20:42,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3885053.3333333335, ans=0.125 2023-11-29 08:21:01,293 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:21:02,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-29 08:21:04,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3885186.6666666665, ans=0.2 2023-11-29 08:21:05,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3885186.6666666665, ans=0.125 2023-11-29 08:21:06,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.122e+01 9.786e+01 1.074e+02 1.432e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 08:21:10,437 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:21:28,643 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5650, loss[loss=0.08001, simple_loss=0.1165, pruned_loss=0.01425, audio_tagging_loss=0.007513, over 15792.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.09011, pruned_loss=0.01183, audio_tagging_loss=0.008688, over 3053421.54 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:21:28,719 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582800 2023-11-29 08:22:10,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3885520.0, ans=0.125 2023-11-29 08:22:16,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3885520.0, ans=0.1 2023-11-29 08:22:23,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3885586.6666666665, ans=0.0 2023-11-29 08:22:29,890 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5700, loss[loss=0.07539, simple_loss=0.1086, pruned_loss=0.01377, audio_tagging_loss=0.007323, over 15211.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08912, pruned_loss=0.01166, audio_tagging_loss=0.008754, over 3044688.45 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:22:29,973 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582850 2023-11-29 08:23:03,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3885786.6666666665, ans=0.0 2023-11-29 08:23:08,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3885853.3333333335, ans=0.0 2023-11-29 08:23:09,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3885853.3333333335, ans=0.0 2023-11-29 08:23:10,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 9.240e+01 1.005e+02 1.081e+02 1.357e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-29 08:23:28,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3885920.0, ans=0.125 2023-11-29 08:23:31,318 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5750, loss[loss=0.07335, simple_loss=0.1054, pruned_loss=0.01285, audio_tagging_loss=0.00779, over 15013.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08815, pruned_loss=0.01153, audio_tagging_loss=0.00869, over 3045635.71 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:23:31,402 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582900 2023-11-29 08:23:34,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3885986.6666666665, ans=0.125 2023-11-29 08:23:46,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3886053.3333333335, ans=0.0 2023-11-29 08:23:55,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2023-11-29 08:24:22,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3886253.3333333335, ans=0.0 2023-11-29 08:24:23,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3886253.3333333335, ans=0.2 2023-11-29 08:24:32,446 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5800, loss[loss=0.04198, simple_loss=0.05034, pruned_loss=0.007402, audio_tagging_loss=0.00941, over 14608.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08904, pruned_loss=0.01172, audio_tagging_loss=0.008592, over 3045178.53 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:24:32,535 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 582950 2023-11-29 08:24:37,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3886320.0, ans=0.125 2023-11-29 08:24:52,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3886386.6666666665, ans=0.125 2023-11-29 08:25:07,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3886453.3333333335, ans=0.0 2023-11-29 08:25:13,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 9.182e+01 9.577e+01 1.050e+02 1.266e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 08:25:26,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3886586.6666666665, ans=0.07 2023-11-29 08:25:26,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-29 08:25:33,492 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5850, loss[loss=0.083, simple_loss=0.1218, pruned_loss=0.01697, audio_tagging_loss=0.005141, over 16264.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.09, pruned_loss=0.01186, audio_tagging_loss=0.008408, over 3050445.59 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:25:33,560 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583000 2023-11-29 08:25:36,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3886653.3333333335, ans=22.5 2023-11-29 08:25:43,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3886653.3333333335, ans=0.125 2023-11-29 08:26:13,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3886853.3333333335, ans=0.0 2023-11-29 08:26:15,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2023-11-29 08:26:19,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3886853.3333333335, ans=0.125 2023-11-29 08:26:26,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3886920.0, ans=0.0 2023-11-29 08:26:36,833 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5900, loss[loss=0.09406, simple_loss=0.1242, pruned_loss=0.02419, audio_tagging_loss=0.007789, over 15439.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08991, pruned_loss=0.01202, audio_tagging_loss=0.008468, over 3050749.19 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:26:36,952 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583050 2023-11-29 08:26:55,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-29 08:26:57,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2023-11-29 08:26:58,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-29 08:27:03,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3887120.0, ans=0.2 2023-11-29 08:27:12,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3887120.0, ans=0.125 2023-11-29 08:27:17,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 9.266e+01 9.950e+01 1.077e+02 1.290e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 08:27:31,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3887253.3333333335, ans=0.07 2023-11-29 08:27:38,605 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 5950, loss[loss=0.07914, simple_loss=0.1036, pruned_loss=0.01793, audio_tagging_loss=0.009398, over 15700.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08934, pruned_loss=0.0119, audio_tagging_loss=0.008418, over 3051164.73 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:27:38,721 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583100 2023-11-29 08:27:38,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3887320.0, ans=0.2 2023-11-29 08:27:45,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3887320.0, ans=0.0 2023-11-29 08:27:59,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-29 08:28:25,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3887520.0, ans=0.2 2023-11-29 08:28:40,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-11-29 08:28:40,691 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6000, loss[loss=0.0675, simple_loss=0.0936, pruned_loss=0.01154, audio_tagging_loss=0.009167, over 15424.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08951, pruned_loss=0.01198, audio_tagging_loss=0.00837, over 3042430.35 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:28:40,692 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 08:29:00,101 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8807, 5.7529, 5.6286, 5.4742], device='cuda:3') 2023-11-29 08:29:01,645 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8730, 2.2391, 2.6688, 2.4209], device='cuda:3') 2023-11-29 08:29:14,262 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8027, 4.9751, 5.0970, 4.9209], device='cuda:3') 2023-11-29 08:29:20,053 INFO [train_asr.py:1267] (3/4) Epoch 49, validation: loss=0.05758, simple_loss=0.05041, pruned_loss=0.005303, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-29 08:29:20,054 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 08:29:20,149 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583150 2023-11-29 08:29:23,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3887653.3333333335, ans=0.0 2023-11-29 08:29:25,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3887653.3333333335, ans=0.2 2023-11-29 08:29:38,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3887720.0, ans=0.0 2023-11-29 08:30:00,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 8.890e+01 9.631e+01 1.036e+02 1.251e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:30:05,509 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:30:13,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3887920.0, ans=0.1 2023-11-29 08:30:20,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3887986.6666666665, ans=0.1 2023-11-29 08:30:21,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3887986.6666666665, ans=0.125 2023-11-29 08:30:22,506 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6050, loss[loss=0.0792, simple_loss=0.1179, pruned_loss=0.01481, audio_tagging_loss=0.005449, over 14717.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08968, pruned_loss=0.012, audio_tagging_loss=0.008327, over 3036279.35 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:30:22,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583200 2023-11-29 08:31:17,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3888253.3333333335, ans=0.1 2023-11-29 08:31:17,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-29 08:31:24,410 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6100, loss[loss=0.06074, simple_loss=0.08765, pruned_loss=0.00976, audio_tagging_loss=0.007158, over 14924.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08973, pruned_loss=0.01196, audio_tagging_loss=0.008254, over 3038560.06 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:31:24,494 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583250 2023-11-29 08:31:41,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3888386.6666666665, ans=0.0 2023-11-29 08:32:03,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3888520.0, ans=0.125 2023-11-29 08:32:05,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.225e+01 1.004e+02 1.045e+02 1.351e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 08:32:06,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3888520.0, ans=0.0 2023-11-29 08:32:08,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2023-11-29 08:32:09,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3888520.0, ans=0.125 2023-11-29 08:32:14,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-29 08:32:19,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3888586.6666666665, ans=0.125 2023-11-29 08:32:23,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3888586.6666666665, ans=0.125 2023-11-29 08:32:25,403 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6150, loss[loss=0.06257, simple_loss=0.0877, pruned_loss=0.01307, audio_tagging_loss=0.005652, over 15369.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08877, pruned_loss=0.01194, audio_tagging_loss=0.008374, over 3030807.10 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:32:25,490 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583300 2023-11-29 08:32:26,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3888653.3333333335, ans=0.1 2023-11-29 08:33:26,854 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6200, loss[loss=0.0546, simple_loss=0.07484, pruned_loss=0.007619, audio_tagging_loss=0.009561, over 15294.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08811, pruned_loss=0.01172, audio_tagging_loss=0.008493, over 3033018.12 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:33:26,956 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583350 2023-11-29 08:33:36,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3888986.6666666665, ans=0.0 2023-11-29 08:33:42,363 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:33:49,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=12.0 2023-11-29 08:34:08,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 9.108e+01 9.848e+01 1.055e+02 1.947e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 08:34:20,394 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:34:29,488 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6250, loss[loss=0.07383, simple_loss=0.1027, pruned_loss=0.01544, audio_tagging_loss=0.007058, over 15398.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08826, pruned_loss=0.01172, audio_tagging_loss=0.008672, over 3041637.03 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:34:29,607 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583400 2023-11-29 08:34:44,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3889386.6666666665, ans=0.1 2023-11-29 08:34:45,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3889386.6666666665, ans=0.0 2023-11-29 08:34:54,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 08:34:55,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-29 08:34:58,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3889453.3333333335, ans=0.0 2023-11-29 08:35:30,172 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6300, loss[loss=0.067, simple_loss=0.08695, pruned_loss=0.01361, audio_tagging_loss=0.009917, over 15380.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08915, pruned_loss=0.01168, audio_tagging_loss=0.008743, over 3044765.34 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:35:30,306 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583450 2023-11-29 08:35:44,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-29 08:35:56,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3889786.6666666665, ans=0.1 2023-11-29 08:36:05,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2023-11-29 08:36:13,128 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 9.155e+01 9.755e+01 1.058e+02 1.266e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 08:36:16,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-29 08:36:24,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3889920.0, ans=0.1 2023-11-29 08:36:32,757 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6350, loss[loss=0.07079, simple_loss=0.09831, pruned_loss=0.01162, audio_tagging_loss=0.01001, over 14752.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08857, pruned_loss=0.01176, audio_tagging_loss=0.008932, over 3043588.01 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:36:32,836 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583500 2023-11-29 08:36:39,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3889986.6666666665, ans=0.125 2023-11-29 08:36:50,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3890053.3333333335, ans=0.1 2023-11-29 08:37:22,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3890253.3333333335, ans=0.125 2023-11-29 08:37:31,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3890253.3333333335, ans=0.09899494936611666 2023-11-29 08:37:36,009 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6400, loss[loss=0.05858, simple_loss=0.08022, pruned_loss=0.007349, audio_tagging_loss=0.01112, over 14957.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08831, pruned_loss=0.01168, audio_tagging_loss=0.008977, over 3041386.21 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:37:36,093 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583550 2023-11-29 08:37:53,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3890386.6666666665, ans=0.125 2023-11-29 08:38:10,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3890520.0, ans=0.125 2023-11-29 08:38:10,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3890520.0, ans=0.2 2023-11-29 08:38:16,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-29 08:38:17,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.981e+01 9.586e+01 1.023e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 08:38:36,781 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6450, loss[loss=0.05509, simple_loss=0.07763, pruned_loss=0.006936, audio_tagging_loss=0.009337, over 14625.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08741, pruned_loss=0.01158, audio_tagging_loss=0.009001, over 3036992.94 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:38:36,880 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583600 2023-11-29 08:38:43,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3890653.3333333335, ans=0.0 2023-11-29 08:39:16,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3890853.3333333335, ans=0.04949747468305833 2023-11-29 08:39:36,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=12.0 2023-11-29 08:39:39,029 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6500, loss[loss=0.08185, simple_loss=0.1254, pruned_loss=0.01385, audio_tagging_loss=0.005312, over 15803.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08856, pruned_loss=0.01179, audio_tagging_loss=0.00888, over 3038786.18 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:39:39,125 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583650 2023-11-29 08:40:02,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3891053.3333333335, ans=0.125 2023-11-29 08:40:22,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.191e+01 9.311e+01 1.000e+02 1.077e+02 1.349e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-29 08:40:28,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-29 08:40:31,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3891253.3333333335, ans=0.125 2023-11-29 08:40:41,241 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6550, loss[loss=0.06468, simple_loss=0.09151, pruned_loss=0.009008, audio_tagging_loss=0.009917, over 15963.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08903, pruned_loss=0.01189, audio_tagging_loss=0.008666, over 3035041.89 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:40:41,318 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583700 2023-11-29 08:41:12,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3891453.3333333335, ans=0.2 2023-11-29 08:41:43,060 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6600, loss[loss=0.06344, simple_loss=0.09257, pruned_loss=0.009139, audio_tagging_loss=0.008014, over 15533.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.0891, pruned_loss=0.0119, audio_tagging_loss=0.008449, over 3033951.45 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:41:43,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583750 2023-11-29 08:41:46,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3891653.3333333335, ans=0.125 2023-11-29 08:42:05,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3891720.0, ans=0.125 2023-11-29 08:42:06,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3891786.6666666665, ans=0.125 2023-11-29 08:42:18,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3891786.6666666665, ans=0.1 2023-11-29 08:42:26,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 9.392e+01 1.005e+02 1.057e+02 1.408e+02, threshold=2.010e+02, percent-clipped=0.0 2023-11-29 08:42:30,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3891853.3333333335, ans=0.2 2023-11-29 08:42:41,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3891920.0, ans=0.0 2023-11-29 08:42:45,090 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6650, loss[loss=0.06004, simple_loss=0.07969, pruned_loss=0.01127, audio_tagging_loss=0.008923, over 14906.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08844, pruned_loss=0.01189, audio_tagging_loss=0.008466, over 3035618.96 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:42:45,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583800 2023-11-29 08:42:49,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-29 08:42:59,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3892053.3333333335, ans=0.0 2023-11-29 08:43:10,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-29 08:43:11,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3892120.0, ans=0.05 2023-11-29 08:43:14,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3892120.0, ans=0.1 2023-11-29 08:43:20,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3892120.0, ans=0.0 2023-11-29 08:43:48,136 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6700, loss[loss=0.0868, simple_loss=0.1184, pruned_loss=0.01754, audio_tagging_loss=0.01008, over 14154.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08907, pruned_loss=0.01193, audio_tagging_loss=0.008461, over 3036530.87 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:43:48,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583850 2023-11-29 08:44:08,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3892386.6666666665, ans=0.125 2023-11-29 08:44:12,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-29 08:44:16,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-29 08:44:18,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3892453.3333333335, ans=0.125 2023-11-29 08:44:31,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.029e+01 9.606e+01 1.021e+02 1.283e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 08:44:32,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3892520.0, ans=0.2 2023-11-29 08:44:47,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3892586.6666666665, ans=0.125 2023-11-29 08:44:49,260 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6750, loss[loss=0.06013, simple_loss=0.07808, pruned_loss=0.009651, audio_tagging_loss=0.01144, over 14970.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08857, pruned_loss=0.01182, audio_tagging_loss=0.00842, over 3039051.29 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:44:49,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583900 2023-11-29 08:44:58,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3892653.3333333335, ans=0.125 2023-11-29 08:45:15,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3892786.6666666665, ans=0.125 2023-11-29 08:45:19,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3892786.6666666665, ans=0.125 2023-11-29 08:45:41,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3892920.0, ans=0.2 2023-11-29 08:45:51,279 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6800, loss[loss=0.0482, simple_loss=0.06103, pruned_loss=0.005886, audio_tagging_loss=0.0118, over 15048.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08802, pruned_loss=0.01171, audio_tagging_loss=0.008433, over 3032903.94 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:45:51,359 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 583950 2023-11-29 08:46:17,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3893120.0, ans=0.125 2023-11-29 08:46:34,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 9.158e+01 9.734e+01 1.076e+02 1.387e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 08:46:51,186 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:46:53,207 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6850, loss[loss=0.05747, simple_loss=0.07797, pruned_loss=0.01065, audio_tagging_loss=0.007837, over 15054.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08816, pruned_loss=0.01169, audio_tagging_loss=0.008386, over 3023742.38 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:46:53,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584000 2023-11-29 08:46:58,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3893320.0, ans=0.125 2023-11-29 08:47:03,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3893320.0, ans=0.0 2023-11-29 08:47:05,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3893320.0, ans=0.125 2023-11-29 08:47:08,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3893386.6666666665, ans=0.025 2023-11-29 08:47:13,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3893386.6666666665, ans=0.125 2023-11-29 08:47:25,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3893453.3333333335, ans=0.125 2023-11-29 08:47:26,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3893453.3333333335, ans=0.125 2023-11-29 08:47:29,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3893453.3333333335, ans=0.0 2023-11-29 08:47:31,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3893520.0, ans=0.1 2023-11-29 08:47:37,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3893520.0, ans=0.0 2023-11-29 08:47:48,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3893586.6666666665, ans=0.0 2023-11-29 08:47:53,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-29 08:47:56,378 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6900, loss[loss=0.04062, simple_loss=0.05556, pruned_loss=0.004571, audio_tagging_loss=0.008265, over 14419.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08864, pruned_loss=0.01176, audio_tagging_loss=0.008347, over 3039691.51 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:47:56,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584050 2023-11-29 08:47:59,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-11-29 08:48:05,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-29 08:48:28,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.27 vs. limit=15.0 2023-11-29 08:48:33,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3893853.3333333335, ans=0.1 2023-11-29 08:48:39,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.126e+01 9.494e+01 1.002e+02 1.227e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 08:48:45,170 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:48:48,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3893920.0, ans=0.125 2023-11-29 08:48:58,276 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 6950, loss[loss=0.06362, simple_loss=0.08359, pruned_loss=0.01297, audio_tagging_loss=0.008863, over 16237.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08891, pruned_loss=0.01174, audio_tagging_loss=0.008355, over 3048901.90 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:48:58,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584100 2023-11-29 08:49:09,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3894053.3333333335, ans=0.125 2023-11-29 08:49:10,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3894053.3333333335, ans=0.125 2023-11-29 08:49:17,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3894053.3333333335, ans=0.0 2023-11-29 08:49:21,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3894120.0, ans=0.1 2023-11-29 08:49:25,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-29 08:49:33,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3894186.6666666665, ans=0.0 2023-11-29 08:49:40,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3894186.6666666665, ans=0.0 2023-11-29 08:49:42,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3894186.6666666665, ans=0.1 2023-11-29 08:49:58,665 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7000, loss[loss=0.06775, simple_loss=0.09548, pruned_loss=0.01141, audio_tagging_loss=0.008599, over 15815.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08904, pruned_loss=0.01162, audio_tagging_loss=0.008415, over 3048172.76 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:49:58,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584150 2023-11-29 08:50:04,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3894320.0, ans=0.2 2023-11-29 08:50:12,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894386.6666666665, ans=0.1 2023-11-29 08:50:16,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3894386.6666666665, ans=0.0 2023-11-29 08:50:22,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3894453.3333333335, ans=0.0 2023-11-29 08:50:23,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3894453.3333333335, ans=0.0 2023-11-29 08:50:32,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3894453.3333333335, ans=0.07 2023-11-29 08:50:42,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.892e+01 9.505e+01 1.031e+02 1.228e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 08:50:51,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3894586.6666666665, ans=0.125 2023-11-29 08:51:00,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3894653.3333333335, ans=0.125 2023-11-29 08:51:01,077 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7050, loss[loss=0.05565, simple_loss=0.07652, pruned_loss=0.009958, audio_tagging_loss=0.007429, over 15409.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08972, pruned_loss=0.01179, audio_tagging_loss=0.008475, over 3045599.53 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:51:01,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584200 2023-11-29 08:51:09,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3894653.3333333335, ans=0.2 2023-11-29 08:51:15,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3894720.0, ans=0.1 2023-11-29 08:51:16,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2023-11-29 08:51:35,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3894786.6666666665, ans=0.2 2023-11-29 08:51:40,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3894853.3333333335, ans=15.0 2023-11-29 08:51:57,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3894920.0, ans=0.0 2023-11-29 08:51:59,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3894920.0, ans=0.04949747468305833 2023-11-29 08:51:59,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3894920.0, ans=0.2 2023-11-29 08:52:01,511 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7100, loss[loss=0.06555, simple_loss=0.0958, pruned_loss=0.009755, audio_tagging_loss=0.007896, over 15943.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08984, pruned_loss=0.01187, audio_tagging_loss=0.008557, over 3044149.02 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:52:01,597 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584250 2023-11-29 08:52:23,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-11-29 08:52:24,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3895053.3333333335, ans=0.125 2023-11-29 08:52:33,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3895120.0, ans=0.125 2023-11-29 08:52:45,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.114e+01 9.630e+01 1.033e+02 1.554e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:53:03,108 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7150, loss[loss=0.06392, simple_loss=0.08754, pruned_loss=0.0124, audio_tagging_loss=0.007751, over 14607.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08963, pruned_loss=0.01178, audio_tagging_loss=0.008597, over 3040681.11 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:53:03,195 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584300 2023-11-29 08:53:18,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3895386.6666666665, ans=0.035 2023-11-29 08:53:18,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3895386.6666666665, ans=0.5 2023-11-29 08:53:26,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3895453.3333333335, ans=0.1 2023-11-29 08:53:27,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3895453.3333333335, ans=0.0 2023-11-29 08:53:37,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-29 08:53:42,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3895520.0, ans=0.1 2023-11-29 08:53:42,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-11-29 08:54:04,765 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7200, loss[loss=0.07099, simple_loss=0.09506, pruned_loss=0.01377, audio_tagging_loss=0.009695, over 15428.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08919, pruned_loss=0.01166, audio_tagging_loss=0.008735, over 3043079.74 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:54:04,839 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584350 2023-11-29 08:54:35,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3895786.6666666665, ans=0.2 2023-11-29 08:54:37,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3895786.6666666665, ans=0.125 2023-11-29 08:54:48,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.925e+01 9.835e+01 1.040e+02 1.813e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 08:54:52,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3895920.0, ans=0.1 2023-11-29 08:55:05,300 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7250, loss[loss=0.06637, simple_loss=0.08795, pruned_loss=0.01431, audio_tagging_loss=0.008086, over 16244.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08977, pruned_loss=0.01188, audio_tagging_loss=0.008738, over 3047752.94 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:55:05,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584400 2023-11-29 08:55:09,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3895986.6666666665, ans=0.125 2023-11-29 08:55:12,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3895986.6666666665, ans=0.125 2023-11-29 08:55:29,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2023-11-29 08:55:32,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3896120.0, ans=0.0 2023-11-29 08:55:48,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3896186.6666666665, ans=0.125 2023-11-29 08:56:05,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896253.3333333335, ans=0.1 2023-11-29 08:56:07,919 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7300, loss[loss=0.08091, simple_loss=0.1064, pruned_loss=0.01909, audio_tagging_loss=0.008618, over 15297.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08936, pruned_loss=0.01187, audio_tagging_loss=0.008655, over 3048375.69 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:56:08,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584450 2023-11-29 08:56:34,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3896453.3333333335, ans=0.125 2023-11-29 08:56:41,519 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:56:44,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3896520.0, ans=0.125 2023-11-29 08:56:47,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3896520.0, ans=0.0 2023-11-29 08:56:51,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.158e+01 9.688e+01 1.038e+02 1.242e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:56:51,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3896520.0, ans=0.125 2023-11-29 08:56:51,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-11-29 08:57:05,796 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:57:08,961 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7350, loss[loss=0.06443, simple_loss=0.08739, pruned_loss=0.0119, audio_tagging_loss=0.008834, over 14442.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08971, pruned_loss=0.01192, audio_tagging_loss=0.008535, over 3048352.52 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:57:09,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584500 2023-11-29 08:57:10,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3896653.3333333335, ans=0.0 2023-11-29 08:57:43,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3896786.6666666665, ans=0.0 2023-11-29 08:58:04,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.99 vs. limit=6.0 2023-11-29 08:58:09,726 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7400, loss[loss=0.07415, simple_loss=0.1022, pruned_loss=0.0153, audio_tagging_loss=0.007771, over 14696.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08884, pruned_loss=0.01187, audio_tagging_loss=0.008441, over 3041947.06 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:58:09,801 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584550 2023-11-29 08:58:22,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-29 08:58:29,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2023-11-29 08:58:35,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3897120.0, ans=0.125 2023-11-29 08:58:38,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-29 08:58:54,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 9.320e+01 9.934e+01 1.095e+02 1.320e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-29 08:59:02,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-29 08:59:10,432 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7450, loss[loss=0.04898, simple_loss=0.05995, pruned_loss=0.00979, audio_tagging_loss=0.009209, over 16488.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08869, pruned_loss=0.01192, audio_tagging_loss=0.008395, over 3044535.39 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:59:10,507 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584600 2023-11-29 08:59:10,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3897320.0, ans=0.125 2023-11-29 08:59:17,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-29 08:59:39,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3897453.3333333335, ans=0.0 2023-11-29 08:59:47,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3897520.0, ans=0.125 2023-11-29 08:59:56,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3897520.0, ans=0.125 2023-11-29 09:00:01,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-29 09:00:11,651 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7500, loss[loss=0.07618, simple_loss=0.098, pruned_loss=0.01744, audio_tagging_loss=0.009741, over 15023.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0888, pruned_loss=0.01193, audio_tagging_loss=0.008401, over 3038008.41 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:00:11,743 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584650 2023-11-29 09:00:33,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3897720.0, ans=0.125 2023-11-29 09:00:39,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3897786.6666666665, ans=0.125 2023-11-29 09:00:57,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.073e+01 9.674e+01 1.060e+02 1.310e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 09:01:03,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3897920.0, ans=0.0 2023-11-29 09:01:12,411 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7550, loss[loss=0.07195, simple_loss=0.09963, pruned_loss=0.01321, audio_tagging_loss=0.008933, over 14573.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08881, pruned_loss=0.0119, audio_tagging_loss=0.008346, over 3045192.87 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:01:12,507 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584700 2023-11-29 09:01:21,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-29 09:01:43,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3898120.0, ans=0.0 2023-11-29 09:01:51,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3898186.6666666665, ans=0.125 2023-11-29 09:01:51,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-11-29 09:01:52,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3898186.6666666665, ans=0.125 2023-11-29 09:02:10,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3898253.3333333335, ans=0.0 2023-11-29 09:02:13,869 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7600, loss[loss=0.07729, simple_loss=0.1099, pruned_loss=0.01553, audio_tagging_loss=0.006796, over 14219.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08875, pruned_loss=0.01198, audio_tagging_loss=0.008403, over 3045169.74 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:02:13,963 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584750 2023-11-29 09:02:18,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3898320.0, ans=0.125 2023-11-29 09:02:20,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3898320.0, ans=0.125 2023-11-29 09:02:21,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3898320.0, ans=0.1 2023-11-29 09:02:42,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3898453.3333333335, ans=0.0 2023-11-29 09:02:44,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3898453.3333333335, ans=0.125 2023-11-29 09:03:00,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.018e+01 9.691e+01 1.036e+02 1.365e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 09:03:03,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:08,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:16,492 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7650, loss[loss=0.07129, simple_loss=0.09291, pruned_loss=0.01648, audio_tagging_loss=0.008361, over 15372.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08889, pruned_loss=0.01194, audio_tagging_loss=0.008409, over 3041416.72 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:03:16,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584800 2023-11-29 09:03:22,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=12.0 2023-11-29 09:03:39,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3898786.6666666665, ans=0.125 2023-11-29 09:03:39,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3898786.6666666665, ans=0.1 2023-11-29 09:03:53,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3898853.3333333335, ans=0.125 2023-11-29 09:03:55,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3898853.3333333335, ans=0.125 2023-11-29 09:04:09,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3898920.0, ans=0.125 2023-11-29 09:04:18,687 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7700, loss[loss=0.06597, simple_loss=0.09459, pruned_loss=0.0109, audio_tagging_loss=0.007776, over 15259.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08888, pruned_loss=0.01188, audio_tagging_loss=0.008351, over 3038474.90 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:04:18,794 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584850 2023-11-29 09:05:02,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3899186.6666666665, ans=0.2 2023-11-29 09:05:05,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 9.378e+01 9.780e+01 1.035e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 09:05:19,943 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7750, loss[loss=0.05236, simple_loss=0.06078, pruned_loss=0.00772, audio_tagging_loss=0.01425, over 14739.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08916, pruned_loss=0.01191, audio_tagging_loss=0.008413, over 3034358.02 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:05:20,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584900 2023-11-29 09:05:22,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3899320.0, ans=22.5 2023-11-29 09:05:49,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3899453.3333333335, ans=0.1 2023-11-29 09:05:53,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3899453.3333333335, ans=0.1 2023-11-29 09:06:21,806 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7800, loss[loss=0.06985, simple_loss=0.1007, pruned_loss=0.01027, audio_tagging_loss=0.009237, over 15484.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08938, pruned_loss=0.01192, audio_tagging_loss=0.008523, over 3035842.60 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:06:21,902 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 584950 2023-11-29 09:06:48,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2023-11-29 09:06:48,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-29 09:06:49,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3899786.6666666665, ans=0.0 2023-11-29 09:06:52,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3899786.6666666665, ans=0.0 2023-11-29 09:06:55,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3899786.6666666665, ans=0.1 2023-11-29 09:06:56,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3899786.6666666665, ans=0.125 2023-11-29 09:07:00,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3899853.3333333335, ans=0.2 2023-11-29 09:07:08,635 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.073e+01 9.693e+01 1.045e+02 1.224e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 09:07:11,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3899920.0, ans=0.0 2023-11-29 09:07:13,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-29 09:07:23,381 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7850, loss[loss=0.05618, simple_loss=0.0802, pruned_loss=0.008733, audio_tagging_loss=0.007349, over 15477.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08862, pruned_loss=0.01176, audio_tagging_loss=0.008583, over 3037220.22 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:07:23,474 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585000 2023-11-29 09:07:43,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3900053.3333333335, ans=0.0 2023-11-29 09:07:53,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3900120.0, ans=0.125 2023-11-29 09:08:01,395 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:08:18,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3900253.3333333335, ans=0.2 2023-11-29 09:08:24,375 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7900, loss[loss=0.06155, simple_loss=0.0801, pruned_loss=0.01128, audio_tagging_loss=0.01022, over 15227.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08793, pruned_loss=0.01179, audio_tagging_loss=0.008668, over 3036792.80 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:08:24,489 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585050 2023-11-29 09:08:55,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3900453.3333333335, ans=0.125 2023-11-29 09:08:56,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-29 09:08:59,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3900520.0, ans=0.125 2023-11-29 09:09:10,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.168e+01 9.323e+01 9.871e+01 1.069e+02 1.326e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 09:09:13,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3900586.6666666665, ans=0.125 2023-11-29 09:09:14,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3900586.6666666665, ans=10.0 2023-11-29 09:09:23,924 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 7950, loss[loss=0.07128, simple_loss=0.09161, pruned_loss=0.01176, audio_tagging_loss=0.01372, over 14544.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08827, pruned_loss=0.01181, audio_tagging_loss=0.008767, over 3044011.36 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:09:24,004 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585100 2023-11-29 09:09:25,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3900653.3333333335, ans=0.125 2023-11-29 09:09:27,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3900653.3333333335, ans=0.125 2023-11-29 09:09:40,416 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:09:40,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3900720.0, ans=0.125 2023-11-29 09:09:57,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3900786.6666666665, ans=0.0 2023-11-29 09:09:57,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3900786.6666666665, ans=0.125 2023-11-29 09:10:22,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3900920.0, ans=0.125 2023-11-29 09:10:24,887 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8000, loss[loss=0.06241, simple_loss=0.08686, pruned_loss=0.01083, audio_tagging_loss=0.008143, over 16065.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08731, pruned_loss=0.01168, audio_tagging_loss=0.008879, over 3040335.68 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:10:24,966 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585150 2023-11-29 09:10:35,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3900986.6666666665, ans=0.0 2023-11-29 09:10:36,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3901053.3333333335, ans=0.0 2023-11-29 09:10:37,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3901053.3333333335, ans=0.1 2023-11-29 09:10:47,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3901053.3333333335, ans=0.125 2023-11-29 09:10:47,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3901053.3333333335, ans=0.0 2023-11-29 09:11:05,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901186.6666666665, ans=0.1 2023-11-29 09:11:06,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3901186.6666666665, ans=0.125 2023-11-29 09:11:11,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.013e+01 9.705e+01 1.030e+02 1.171e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 09:11:15,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-11-29 09:11:25,771 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8050, loss[loss=0.07102, simple_loss=0.09445, pruned_loss=0.01556, audio_tagging_loss=0.008229, over 14605.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08792, pruned_loss=0.01168, audio_tagging_loss=0.008873, over 3036562.15 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:11:25,869 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585200 2023-11-29 09:11:41,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3901386.6666666665, ans=0.015 2023-11-29 09:11:47,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3901386.6666666665, ans=0.125 2023-11-29 09:12:17,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3901586.6666666665, ans=0.125 2023-11-29 09:12:28,025 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8100, loss[loss=0.05973, simple_loss=0.08183, pruned_loss=0.01032, audio_tagging_loss=0.008498, over 15731.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08868, pruned_loss=0.01185, audio_tagging_loss=0.008709, over 3046601.56 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:12:28,117 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585250 2023-11-29 09:12:42,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:12:49,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:12:52,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3901786.6666666665, ans=0.1 2023-11-29 09:12:58,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3901786.6666666665, ans=0.125 2023-11-29 09:13:16,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.257e+01 9.923e+01 1.057e+02 1.359e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 09:13:23,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3901920.0, ans=0.125 2023-11-29 09:13:29,945 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8150, loss[loss=0.05851, simple_loss=0.07587, pruned_loss=0.01058, audio_tagging_loss=0.009994, over 15047.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08827, pruned_loss=0.0119, audio_tagging_loss=0.008699, over 3045521.87 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:13:30,029 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585300 2023-11-29 09:14:04,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3902120.0, ans=10.0 2023-11-29 09:14:07,937 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:14:14,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-29 09:14:19,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3902253.3333333335, ans=0.2 2023-11-29 09:14:28,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3902253.3333333335, ans=0.0 2023-11-29 09:14:31,337 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8200, loss[loss=0.07191, simple_loss=0.09947, pruned_loss=0.01218, audio_tagging_loss=0.009988, over 15894.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08854, pruned_loss=0.01186, audio_tagging_loss=0.008715, over 3047707.71 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:14:31,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585350 2023-11-29 09:14:33,615 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:14:51,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3902386.6666666665, ans=0.0 2023-11-29 09:14:59,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3902453.3333333335, ans=0.0 2023-11-29 09:15:01,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-11-29 09:15:19,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.775e+01 9.273e+01 9.882e+01 1.047e+02 1.240e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:15:25,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3902586.6666666665, ans=0.015 2023-11-29 09:15:25,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3902586.6666666665, ans=0.2 2023-11-29 09:15:34,280 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8250, loss[loss=0.07115, simple_loss=0.1019, pruned_loss=0.01401, audio_tagging_loss=0.006179, over 15306.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08795, pruned_loss=0.01186, audio_tagging_loss=0.008599, over 3044963.82 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:15:34,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585400 2023-11-29 09:15:44,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3902653.3333333335, ans=10.0 2023-11-29 09:15:52,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3902720.0, ans=0.125 2023-11-29 09:16:03,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3902786.6666666665, ans=0.125 2023-11-29 09:16:15,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2023-11-29 09:16:15,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3902853.3333333335, ans=0.2 2023-11-29 09:16:20,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3902853.3333333335, ans=0.2 2023-11-29 09:16:32,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3902920.0, ans=0.125 2023-11-29 09:16:36,689 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8300, loss[loss=0.05494, simple_loss=0.07533, pruned_loss=0.009542, audio_tagging_loss=0.00773, over 14772.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08881, pruned_loss=0.0121, audio_tagging_loss=0.008505, over 3044653.04 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:16:36,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585450 2023-11-29 09:16:54,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3903053.3333333335, ans=0.07 2023-11-29 09:16:55,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-11-29 09:17:14,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3903186.6666666665, ans=0.125 2023-11-29 09:17:22,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3903186.6666666665, ans=0.125 2023-11-29 09:17:24,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.975e+01 9.727e+01 1.046e+02 1.425e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 09:17:36,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3903320.0, ans=10.0 2023-11-29 09:17:37,222 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8350, loss[loss=0.07257, simple_loss=0.09943, pruned_loss=0.01301, audio_tagging_loss=0.009846, over 14408.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08923, pruned_loss=0.01206, audio_tagging_loss=0.008471, over 3041677.94 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:17:37,303 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585500 2023-11-29 09:17:45,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3903320.0, ans=0.05 2023-11-29 09:17:50,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3903386.6666666665, ans=0.125 2023-11-29 09:17:53,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3903386.6666666665, ans=0.125 2023-11-29 09:17:54,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.00 vs. limit=15.0 2023-11-29 09:18:05,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3903453.3333333335, ans=0.0 2023-11-29 09:18:16,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3903520.0, ans=0.0 2023-11-29 09:18:37,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3903586.6666666665, ans=0.125 2023-11-29 09:18:39,267 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8400, loss[loss=0.05511, simple_loss=0.07446, pruned_loss=0.01046, audio_tagging_loss=0.007426, over 15512.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08961, pruned_loss=0.01226, audio_tagging_loss=0.008462, over 3043748.64 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:18:39,345 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585550 2023-11-29 09:18:44,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3903653.3333333335, ans=0.125 2023-11-29 09:19:13,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3903786.6666666665, ans=0.125 2023-11-29 09:19:28,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.927e+01 9.448e+01 1.050e+02 1.277e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 09:19:29,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-29 09:19:41,541 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8450, loss[loss=0.05165, simple_loss=0.07143, pruned_loss=0.00972, audio_tagging_loss=0.006212, over 14181.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08875, pruned_loss=0.01197, audio_tagging_loss=0.008578, over 3050387.31 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:19:41,617 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585600 2023-11-29 09:19:45,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3903986.6666666665, ans=0.0 2023-11-29 09:19:46,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2023-11-29 09:19:55,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3904053.3333333335, ans=0.125 2023-11-29 09:19:55,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3904053.3333333335, ans=0.125 2023-11-29 09:20:09,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3904120.0, ans=0.0 2023-11-29 09:20:15,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3904120.0, ans=0.1 2023-11-29 09:20:39,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3904253.3333333335, ans=0.0 2023-11-29 09:20:42,686 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8500, loss[loss=0.06118, simple_loss=0.0809, pruned_loss=0.01359, audio_tagging_loss=0.00714, over 13890.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08795, pruned_loss=0.0118, audio_tagging_loss=0.00867, over 3047556.03 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:20:42,764 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585650 2023-11-29 09:20:51,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3904320.0, ans=0.125 2023-11-29 09:21:13,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3904453.3333333335, ans=0.0 2023-11-29 09:21:14,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3904453.3333333335, ans=0.125 2023-11-29 09:21:22,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3904520.0, ans=0.09899494936611666 2023-11-29 09:21:25,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3904520.0, ans=0.0 2023-11-29 09:21:31,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 9.118e+01 9.879e+01 1.041e+02 1.425e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:21:44,142 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8550, loss[loss=0.07312, simple_loss=0.09683, pruned_loss=0.01535, audio_tagging_loss=0.009349, over 14850.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08726, pruned_loss=0.0117, audio_tagging_loss=0.008709, over 3051880.47 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:21:44,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585700 2023-11-29 09:21:54,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3904653.3333333335, ans=0.125 2023-11-29 09:22:28,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3904853.3333333335, ans=0.125 2023-11-29 09:22:36,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3904920.0, ans=0.125 2023-11-29 09:22:46,518 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8600, loss[loss=0.05665, simple_loss=0.08248, pruned_loss=0.007456, audio_tagging_loss=0.007958, over 14398.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08815, pruned_loss=0.01183, audio_tagging_loss=0.008679, over 3048945.67 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:22:46,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585750 2023-11-29 09:22:53,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3904986.6666666665, ans=0.0 2023-11-29 09:23:18,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-29 09:23:21,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3905186.6666666665, ans=0.125 2023-11-29 09:23:29,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-11-29 09:23:36,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.131e+01 9.509e+01 1.045e+02 1.246e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 09:23:44,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-29 09:23:47,811 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8650, loss[loss=0.064, simple_loss=0.08654, pruned_loss=0.01275, audio_tagging_loss=0.007976, over 15189.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.0884, pruned_loss=0.01185, audio_tagging_loss=0.00867, over 3048777.61 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:23:47,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585800 2023-11-29 09:23:56,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3905320.0, ans=0.125 2023-11-29 09:24:00,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3905386.6666666665, ans=0.2 2023-11-29 09:24:01,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-29 09:24:21,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-29 09:24:25,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3905520.0, ans=0.125 2023-11-29 09:24:49,337 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8700, loss[loss=0.06532, simple_loss=0.09011, pruned_loss=0.011, audio_tagging_loss=0.009259, over 15332.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08842, pruned_loss=0.01194, audio_tagging_loss=0.008837, over 3049651.11 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:24:49,434 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585850 2023-11-29 09:25:01,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3905720.0, ans=0.125 2023-11-29 09:25:38,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.006e+01 9.124e+01 9.803e+01 1.053e+02 1.295e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 09:25:38,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3905920.0, ans=0.1 2023-11-29 09:25:43,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3905920.0, ans=0.125 2023-11-29 09:25:46,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3905920.0, ans=0.0 2023-11-29 09:25:51,253 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8750, loss[loss=0.05726, simple_loss=0.0761, pruned_loss=0.009928, audio_tagging_loss=0.009283, over 14831.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08912, pruned_loss=0.01207, audio_tagging_loss=0.00887, over 3055462.65 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:25:51,330 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585900 2023-11-29 09:25:52,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3905986.6666666665, ans=0.125 2023-11-29 09:26:14,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3906120.0, ans=0.125 2023-11-29 09:26:14,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-29 09:26:20,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3906120.0, ans=0.0 2023-11-29 09:26:49,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3906253.3333333335, ans=0.0 2023-11-29 09:26:51,796 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8800, loss[loss=0.04414, simple_loss=0.061, pruned_loss=0.003917, audio_tagging_loss=0.009722, over 14903.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09043, pruned_loss=0.0121, audio_tagging_loss=0.008911, over 3050926.16 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:26:51,902 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 585950 2023-11-29 09:27:02,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3906386.6666666665, ans=0.125 2023-11-29 09:27:26,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3906453.3333333335, ans=0.1 2023-11-29 09:27:40,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 9.118e+01 9.794e+01 1.059e+02 1.251e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 09:27:44,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3906586.6666666665, ans=0.125 2023-11-29 09:27:52,763 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8850, loss[loss=0.05732, simple_loss=0.0762, pruned_loss=0.009891, audio_tagging_loss=0.009323, over 14385.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08883, pruned_loss=0.01183, audio_tagging_loss=0.008901, over 3043520.06 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:27:52,839 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586000 2023-11-29 09:28:05,767 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:28:34,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3906853.3333333335, ans=0.125 2023-11-29 09:28:53,391 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8900, loss[loss=0.06559, simple_loss=0.08787, pruned_loss=0.0126, audio_tagging_loss=0.009056, over 14224.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08846, pruned_loss=0.01176, audio_tagging_loss=0.008759, over 3042909.44 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:28:53,478 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586050 2023-11-29 09:29:29,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2023-11-29 09:29:42,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.012e+01 9.649e+01 1.060e+02 1.281e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 09:29:55,271 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 8950, loss[loss=0.07599, simple_loss=0.1058, pruned_loss=0.01574, audio_tagging_loss=0.007359, over 14763.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08781, pruned_loss=0.01167, audio_tagging_loss=0.00872, over 3044782.70 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:29:55,346 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586100 2023-11-29 09:30:05,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3907386.6666666665, ans=0.125 2023-11-29 09:30:52,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3907586.6666666665, ans=0.2 2023-11-29 09:30:53,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-29 09:30:56,182 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9000, loss[loss=0.05035, simple_loss=0.07063, pruned_loss=0.007078, audio_tagging_loss=0.007958, over 15181.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08807, pruned_loss=0.01168, audio_tagging_loss=0.008517, over 3045045.32 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:30:56,182 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 09:31:35,995 INFO [train_asr.py:1267] (3/4) Epoch 49, validation: loss=0.05863, simple_loss=0.05047, pruned_loss=0.00547, audio_tagging_loss=0.02792, over 4681554.00 frames. 2023-11-29 09:31:35,996 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 09:31:36,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586150 2023-11-29 09:31:57,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3907720.0, ans=0.05 2023-11-29 09:32:00,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-29 09:32:07,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3907786.6666666665, ans=0.125 2023-11-29 09:32:12,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3907853.3333333335, ans=0.04949747468305833 2023-11-29 09:32:17,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3907853.3333333335, ans=0.125 2023-11-29 09:32:25,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 9.291e+01 1.003e+02 1.087e+02 1.506e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 09:32:30,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3907920.0, ans=0.125 2023-11-29 09:32:37,797 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9050, loss[loss=0.05992, simple_loss=0.08773, pruned_loss=0.009365, audio_tagging_loss=0.006695, over 16377.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08803, pruned_loss=0.01161, audio_tagging_loss=0.008445, over 3048439.79 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:32:37,892 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586200 2023-11-29 09:33:10,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3908120.0, ans=0.05 2023-11-29 09:33:39,604 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9100, loss[loss=0.06246, simple_loss=0.08198, pruned_loss=0.01377, audio_tagging_loss=0.007698, over 15371.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08834, pruned_loss=0.01164, audio_tagging_loss=0.008315, over 3049807.92 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:33:39,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586250 2023-11-29 09:33:43,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3908320.0, ans=0.2 2023-11-29 09:33:54,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-29 09:34:02,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3908386.6666666665, ans=0.0 2023-11-29 09:34:15,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3908520.0, ans=0.0 2023-11-29 09:34:30,297 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 9.126e+01 9.745e+01 1.074e+02 1.723e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 09:34:37,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3908586.6666666665, ans=0.2 2023-11-29 09:34:40,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3908653.3333333335, ans=0.1 2023-11-29 09:34:41,028 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9150, loss[loss=0.04773, simple_loss=0.06788, pruned_loss=0.006586, audio_tagging_loss=0.007207, over 14340.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08932, pruned_loss=0.0118, audio_tagging_loss=0.008313, over 3043283.58 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:34:41,101 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586300 2023-11-29 09:34:50,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3908653.3333333335, ans=0.0 2023-11-29 09:35:04,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3908720.0, ans=0.0 2023-11-29 09:35:34,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3908920.0, ans=0.0 2023-11-29 09:35:44,164 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9200, loss[loss=0.0518, simple_loss=0.07011, pruned_loss=0.007782, audio_tagging_loss=0.00896, over 15339.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08946, pruned_loss=0.01198, audio_tagging_loss=0.008375, over 3055111.09 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:35:44,244 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586350 2023-11-29 09:36:23,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3909186.6666666665, ans=0.125 2023-11-29 09:36:34,385 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.956e+01 9.492e+01 1.042e+02 1.619e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-29 09:36:45,685 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9250, loss[loss=0.0423, simple_loss=0.05502, pruned_loss=0.004115, audio_tagging_loss=0.01067, over 13612.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08828, pruned_loss=0.01166, audio_tagging_loss=0.008365, over 3052089.08 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:36:45,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586400 2023-11-29 09:36:52,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3909320.0, ans=0.1 2023-11-29 09:36:54,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3909320.0, ans=0.125 2023-11-29 09:36:59,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3909386.6666666665, ans=0.0 2023-11-29 09:37:14,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-29 09:37:22,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3909520.0, ans=0.025 2023-11-29 09:37:24,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3909520.0, ans=0.07 2023-11-29 09:37:24,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2023-11-29 09:37:40,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909586.6666666665, ans=0.1 2023-11-29 09:37:45,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3909586.6666666665, ans=0.125 2023-11-29 09:37:46,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3909653.3333333335, ans=0.0 2023-11-29 09:37:47,391 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9300, loss[loss=0.04888, simple_loss=0.06743, pruned_loss=0.006671, audio_tagging_loss=0.008498, over 16832.00 frames. ], tot_loss[loss=0.06394, simple_loss=0.08797, pruned_loss=0.01152, audio_tagging_loss=0.008435, over 3056006.30 frames. ], batch size: 65, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:37:47,536 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586450 2023-11-29 09:38:23,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3909853.3333333335, ans=0.2 2023-11-29 09:38:38,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 9.042e+01 9.889e+01 1.074e+02 1.345e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 09:38:49,381 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9350, loss[loss=0.05694, simple_loss=0.0788, pruned_loss=0.01037, audio_tagging_loss=0.007175, over 14163.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08841, pruned_loss=0.01169, audio_tagging_loss=0.008527, over 3055658.72 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:38:49,474 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586500 2023-11-29 09:38:59,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3909986.6666666665, ans=0.0 2023-11-29 09:39:33,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3910186.6666666665, ans=0.125 2023-11-29 09:39:39,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3910253.3333333335, ans=0.0 2023-11-29 09:39:51,638 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9400, loss[loss=0.05776, simple_loss=0.07353, pruned_loss=0.01312, audio_tagging_loss=0.007869, over 15534.00 frames. ], tot_loss[loss=0.06407, simple_loss=0.08787, pruned_loss=0.01156, audio_tagging_loss=0.008579, over 3051461.36 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:39:51,723 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586550 2023-11-29 09:40:23,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-11-29 09:40:28,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3910520.0, ans=0.0 2023-11-29 09:40:28,382 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:40:35,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3910520.0, ans=0.05 2023-11-29 09:40:36,322 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:40:42,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.989e+01 9.545e+01 1.042e+02 1.282e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 09:40:52,742 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:40:53,886 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9450, loss[loss=0.05767, simple_loss=0.07871, pruned_loss=0.01006, audio_tagging_loss=0.008252, over 14980.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08767, pruned_loss=0.01153, audio_tagging_loss=0.008673, over 3049342.55 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:40:53,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586600 2023-11-29 09:40:59,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3910653.3333333335, ans=0.125 2023-11-29 09:41:11,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3910720.0, ans=0.0 2023-11-29 09:41:12,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3910720.0, ans=0.125 2023-11-29 09:41:14,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3910720.0, ans=0.0 2023-11-29 09:41:21,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2023-11-29 09:41:33,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3910853.3333333335, ans=0.125 2023-11-29 09:41:38,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3910853.3333333335, ans=0.2 2023-11-29 09:41:55,169 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9500, loss[loss=0.0619, simple_loss=0.08717, pruned_loss=0.009845, audio_tagging_loss=0.008472, over 16300.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.089, pruned_loss=0.01161, audio_tagging_loss=0.008695, over 3051972.43 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:41:55,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586650 2023-11-29 09:41:55,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3910986.6666666665, ans=0.0 2023-11-29 09:42:12,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3911053.3333333335, ans=0.125 2023-11-29 09:42:17,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3911053.3333333335, ans=0.0 2023-11-29 09:42:24,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3911120.0, ans=0.0 2023-11-29 09:42:32,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3911186.6666666665, ans=0.5 2023-11-29 09:42:40,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3911186.6666666665, ans=0.95 2023-11-29 09:42:45,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.983e+01 9.496e+01 1.015e+02 1.388e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 09:42:49,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3911253.3333333335, ans=0.125 2023-11-29 09:42:55,724 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9550, loss[loss=0.05323, simple_loss=0.06712, pruned_loss=0.00814, audio_tagging_loss=0.01153, over 14845.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08881, pruned_loss=0.01151, audio_tagging_loss=0.008753, over 3056139.20 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:42:55,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586700 2023-11-29 09:43:17,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3911386.6666666665, ans=0.125 2023-11-29 09:43:18,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3911386.6666666665, ans=0.0 2023-11-29 09:43:21,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3911453.3333333335, ans=0.125 2023-11-29 09:43:31,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3911453.3333333335, ans=6.0 2023-11-29 09:43:33,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3911520.0, ans=0.125 2023-11-29 09:43:58,371 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9600, loss[loss=0.05745, simple_loss=0.08425, pruned_loss=0.008283, audio_tagging_loss=0.007038, over 15961.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08954, pruned_loss=0.01177, audio_tagging_loss=0.008742, over 3054873.74 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:43:58,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586750 2023-11-29 09:44:33,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3911853.3333333335, ans=0.1 2023-11-29 09:44:41,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3911853.3333333335, ans=0.125 2023-11-29 09:44:49,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 9.041e+01 9.598e+01 1.061e+02 1.358e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 09:45:00,425 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9650, loss[loss=0.05455, simple_loss=0.08343, pruned_loss=0.006726, audio_tagging_loss=0.006105, over 15366.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08949, pruned_loss=0.01183, audio_tagging_loss=0.008744, over 3049766.35 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:45:00,498 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586800 2023-11-29 09:45:01,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3911986.6666666665, ans=0.125 2023-11-29 09:45:22,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2023-11-29 09:45:33,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3912120.0, ans=0.125 2023-11-29 09:45:59,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3912253.3333333335, ans=0.0 2023-11-29 09:46:01,551 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9700, loss[loss=0.07255, simple_loss=0.09681, pruned_loss=0.01635, audio_tagging_loss=0.007795, over 15476.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08941, pruned_loss=0.01185, audio_tagging_loss=0.008585, over 3048606.93 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:46:01,668 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586850 2023-11-29 09:46:13,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3912386.6666666665, ans=0.125 2023-11-29 09:46:53,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.022e+01 9.629e+01 1.062e+02 1.416e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 09:47:00,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-29 09:47:03,536 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9750, loss[loss=0.05933, simple_loss=0.08773, pruned_loss=0.007527, audio_tagging_loss=0.007934, over 15542.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08869, pruned_loss=0.01172, audio_tagging_loss=0.008509, over 3047813.11 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:47:03,632 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586900 2023-11-29 09:47:54,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3912920.0, ans=0.2 2023-11-29 09:48:07,005 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9800, loss[loss=0.06583, simple_loss=0.094, pruned_loss=0.01201, audio_tagging_loss=0.006822, over 16063.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08898, pruned_loss=0.01179, audio_tagging_loss=0.008432, over 3049682.51 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:48:07,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 586950 2023-11-29 09:48:08,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3912986.6666666665, ans=0.5 2023-11-29 09:48:20,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2023-11-29 09:48:27,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-29 09:48:37,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-29 09:48:41,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913186.6666666665, ans=0.1 2023-11-29 09:48:51,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-29 09:48:59,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.885e+01 9.275e+01 1.009e+02 1.063e+02 1.388e+02, threshold=2.019e+02, percent-clipped=0.0 2023-11-29 09:49:02,186 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:49:08,175 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9850, loss[loss=0.09192, simple_loss=0.1287, pruned_loss=0.0207, audio_tagging_loss=0.006883, over 16097.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08917, pruned_loss=0.01175, audio_tagging_loss=0.008351, over 3054414.52 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:49:08,246 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587000 2023-11-29 09:49:14,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3913320.0, ans=0.125 2023-11-29 09:49:45,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913520.0, ans=0.1 2023-11-29 09:49:46,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3913520.0, ans=0.125 2023-11-29 09:49:46,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3913520.0, ans=0.1 2023-11-29 09:49:54,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2023-11-29 09:49:55,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3913520.0, ans=0.125 2023-11-29 09:49:57,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3913586.6666666665, ans=0.0 2023-11-29 09:50:10,920 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9900, loss[loss=0.0814, simple_loss=0.1134, pruned_loss=0.01877, audio_tagging_loss=0.005944, over 15273.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08871, pruned_loss=0.01171, audio_tagging_loss=0.008348, over 3048496.75 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:50:10,999 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587050 2023-11-29 09:50:18,229 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:50:31,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3913720.0, ans=0.125 2023-11-29 09:50:36,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3913786.6666666665, ans=0.2 2023-11-29 09:50:40,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3913786.6666666665, ans=0.125 2023-11-29 09:50:54,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-11-29 09:51:00,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3913920.0, ans=0.0 2023-11-29 09:51:03,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.220e+01 9.633e+01 1.039e+02 1.360e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 09:51:08,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3913920.0, ans=0.5 2023-11-29 09:51:12,754 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 9950, loss[loss=0.05789, simple_loss=0.07693, pruned_loss=0.01032, audio_tagging_loss=0.009104, over 15767.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08828, pruned_loss=0.01169, audio_tagging_loss=0.008344, over 3058692.40 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:51:12,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587100 2023-11-29 09:51:30,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-29 09:51:37,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3914120.0, ans=0.1 2023-11-29 09:52:08,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3914253.3333333335, ans=0.125 2023-11-29 09:52:14,041 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10000, loss[loss=0.06658, simple_loss=0.08604, pruned_loss=0.01344, audio_tagging_loss=0.01012, over 16115.00 frames. ], tot_loss[loss=0.06392, simple_loss=0.0878, pruned_loss=0.01157, audio_tagging_loss=0.008454, over 3053154.59 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:52:14,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587150 2023-11-29 09:52:55,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3914520.0, ans=0.0 2023-11-29 09:52:59,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=12.0 2023-11-29 09:53:07,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.076e+01 9.668e+01 1.049e+02 3.214e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 09:53:07,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3914586.6666666665, ans=0.1 2023-11-29 09:53:11,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-29 09:53:12,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2023-11-29 09:53:13,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3914586.6666666665, ans=0.125 2023-11-29 09:53:14,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2023-11-29 09:53:15,292 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10050, loss[loss=0.07014, simple_loss=0.1028, pruned_loss=0.01042, audio_tagging_loss=0.00831, over 16249.00 frames. ], tot_loss[loss=0.0637, simple_loss=0.08744, pruned_loss=0.01144, audio_tagging_loss=0.008544, over 3049713.64 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:53:15,388 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587200 2023-11-29 09:53:24,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-29 09:54:02,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3914853.3333333335, ans=0.0 2023-11-29 09:54:09,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2023-11-29 09:54:16,923 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10100, loss[loss=0.07484, simple_loss=0.1096, pruned_loss=0.01432, audio_tagging_loss=0.005715, over 16125.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08839, pruned_loss=0.01166, audio_tagging_loss=0.008453, over 3050305.53 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:54:17,013 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587250 2023-11-29 09:54:22,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3914986.6666666665, ans=0.125 2023-11-29 09:54:26,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3914986.6666666665, ans=0.0 2023-11-29 09:54:29,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3915053.3333333335, ans=0.125 2023-11-29 09:54:55,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3915186.6666666665, ans=0.0 2023-11-29 09:54:57,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3915186.6666666665, ans=0.0 2023-11-29 09:54:58,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3915186.6666666665, ans=0.0 2023-11-29 09:55:06,666 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:10,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.039e+01 9.666e+01 1.026e+02 1.279e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 09:55:18,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3915320.0, ans=0.125 2023-11-29 09:55:19,728 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10150, loss[loss=0.05921, simple_loss=0.07991, pruned_loss=0.01119, audio_tagging_loss=0.008054, over 14820.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08782, pruned_loss=0.01169, audio_tagging_loss=0.008555, over 3052718.77 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:55:19,812 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587300 2023-11-29 09:55:46,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-11-29 09:55:48,752 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:54,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3915453.3333333335, ans=0.0 2023-11-29 09:56:17,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3915586.6666666665, ans=0.2 2023-11-29 09:56:20,699 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10200, loss[loss=0.06844, simple_loss=0.09699, pruned_loss=0.01275, audio_tagging_loss=0.007196, over 14170.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08804, pruned_loss=0.01164, audio_tagging_loss=0.008527, over 3053825.13 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:56:20,798 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587350 2023-11-29 09:56:22,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3915653.3333333335, ans=0.125 2023-11-29 09:56:22,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-29 09:56:27,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3915653.3333333335, ans=0.0 2023-11-29 09:56:44,802 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:57:14,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.990e+01 9.483e+01 1.035e+02 1.443e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 09:57:22,294 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10250, loss[loss=0.05008, simple_loss=0.06472, pruned_loss=0.00745, audio_tagging_loss=0.01027, over 15903.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08785, pruned_loss=0.01173, audio_tagging_loss=0.008596, over 3048101.16 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:57:22,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587400 2023-11-29 09:57:33,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3915986.6666666665, ans=0.125 2023-11-29 09:57:36,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3916053.3333333335, ans=0.2 2023-11-29 09:57:59,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3916186.6666666665, ans=0.2 2023-11-29 09:58:17,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3916253.3333333335, ans=0.1 2023-11-29 09:58:25,270 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10300, loss[loss=0.07326, simple_loss=0.1095, pruned_loss=0.009027, audio_tagging_loss=0.009485, over 15240.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08765, pruned_loss=0.01161, audio_tagging_loss=0.008714, over 3044951.89 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:58:25,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587450 2023-11-29 09:58:33,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3916320.0, ans=0.1 2023-11-29 09:58:56,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=12.0 2023-11-29 09:59:11,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2023-11-29 09:59:14,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.05 vs. limit=22.5 2023-11-29 09:59:16,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-29 09:59:16,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=12.0 2023-11-29 09:59:17,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3916586.6666666665, ans=0.015 2023-11-29 09:59:17,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3916586.6666666665, ans=0.0 2023-11-29 09:59:18,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 9.346e+01 9.831e+01 1.081e+02 1.349e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 09:59:26,186 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:59:27,080 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10350, loss[loss=0.06144, simple_loss=0.07782, pruned_loss=0.01062, audio_tagging_loss=0.01191, over 15436.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08757, pruned_loss=0.01165, audio_tagging_loss=0.008746, over 3038996.59 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:59:27,165 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587500 2023-11-29 09:59:28,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3916653.3333333335, ans=0.2 2023-11-29 09:59:47,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-11-29 09:59:49,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3916720.0, ans=0.2 2023-11-29 09:59:55,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-29 09:59:55,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-29 10:00:00,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3916786.6666666665, ans=0.2 2023-11-29 10:00:19,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2023-11-29 10:00:28,910 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10400, loss[loss=0.0539, simple_loss=0.06197, pruned_loss=0.01119, audio_tagging_loss=0.01173, over 14734.00 frames. ], tot_loss[loss=0.06366, simple_loss=0.0866, pruned_loss=0.01143, audio_tagging_loss=0.008929, over 3036688.28 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:00:28,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587550 2023-11-29 10:00:39,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3916986.6666666665, ans=0.1 2023-11-29 10:00:40,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3917053.3333333335, ans=0.0 2023-11-29 10:00:44,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-29 10:00:50,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-29 10:00:53,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3917120.0, ans=0.125 2023-11-29 10:01:04,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3917120.0, ans=0.0 2023-11-29 10:01:17,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3917253.3333333335, ans=0.035 2023-11-29 10:01:22,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.130e+01 9.665e+01 1.040e+02 1.258e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 10:01:25,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3917253.3333333335, ans=0.125 2023-11-29 10:01:29,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3917253.3333333335, ans=0.1 2023-11-29 10:01:31,580 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10450, loss[loss=0.05856, simple_loss=0.07961, pruned_loss=0.01058, audio_tagging_loss=0.008179, over 15594.00 frames. ], tot_loss[loss=0.0636, simple_loss=0.08666, pruned_loss=0.01141, audio_tagging_loss=0.008859, over 3035685.01 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:01:31,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587600 2023-11-29 10:02:18,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3917520.0, ans=0.125 2023-11-29 10:02:19,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3917520.0, ans=0.0 2023-11-29 10:02:33,211 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10500, loss[loss=0.05502, simple_loss=0.07155, pruned_loss=0.007943, audio_tagging_loss=0.01131, over 15409.00 frames. ], tot_loss[loss=0.06366, simple_loss=0.08688, pruned_loss=0.01155, audio_tagging_loss=0.008667, over 3031762.84 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:02:33,381 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587650 2023-11-29 10:03:06,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3917786.6666666665, ans=0.125 2023-11-29 10:03:26,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.916e+01 9.691e+01 1.026e+02 1.437e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 10:03:35,734 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10550, loss[loss=0.05564, simple_loss=0.07796, pruned_loss=0.007635, audio_tagging_loss=0.009023, over 15608.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08799, pruned_loss=0.0117, audio_tagging_loss=0.00854, over 3037305.06 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:03:35,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587700 2023-11-29 10:04:03,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3918120.0, ans=0.0 2023-11-29 10:04:08,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3918120.0, ans=0.125 2023-11-29 10:04:11,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3918186.6666666665, ans=0.0 2023-11-29 10:04:23,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3918186.6666666665, ans=0.125 2023-11-29 10:04:25,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-11-29 10:04:38,656 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10600, loss[loss=0.06373, simple_loss=0.08159, pruned_loss=0.01123, audio_tagging_loss=0.01171, over 14529.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08826, pruned_loss=0.01172, audio_tagging_loss=0.008483, over 3041070.63 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:04:38,742 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587750 2023-11-29 10:04:47,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2023-11-29 10:04:54,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3918386.6666666665, ans=0.1 2023-11-29 10:04:58,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3918386.6666666665, ans=0.0 2023-11-29 10:05:12,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-29 10:05:22,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-29 10:05:23,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3918520.0, ans=0.05 2023-11-29 10:05:31,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.953e+01 9.680e+01 1.038e+02 1.330e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 10:05:40,138 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10650, loss[loss=0.09175, simple_loss=0.1288, pruned_loss=0.02065, audio_tagging_loss=0.006727, over 15774.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08813, pruned_loss=0.01176, audio_tagging_loss=0.008446, over 3035633.74 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:05:40,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587800 2023-11-29 10:05:41,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3918653.3333333335, ans=0.125 2023-11-29 10:05:47,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3918653.3333333335, ans=0.1 2023-11-29 10:06:07,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3918786.6666666665, ans=0.5 2023-11-29 10:06:21,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3918853.3333333335, ans=0.125 2023-11-29 10:06:30,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-11-29 10:06:35,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3918920.0, ans=0.125 2023-11-29 10:06:41,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3918986.6666666665, ans=0.125 2023-11-29 10:06:41,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3918986.6666666665, ans=0.0 2023-11-29 10:06:42,298 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10700, loss[loss=0.05773, simple_loss=0.08636, pruned_loss=0.00761, audio_tagging_loss=0.00694, over 15838.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08813, pruned_loss=0.01165, audio_tagging_loss=0.008487, over 3042169.64 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:06:42,396 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587850 2023-11-29 10:06:53,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3919053.3333333335, ans=0.125 2023-11-29 10:07:07,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3919120.0, ans=0.125 2023-11-29 10:07:17,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3919186.6666666665, ans=0.0 2023-11-29 10:07:21,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3919186.6666666665, ans=0.125 2023-11-29 10:07:36,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.064e+01 9.584e+01 1.018e+02 1.338e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:07:39,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3919253.3333333335, ans=0.125 2023-11-29 10:07:44,307 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10750, loss[loss=0.05707, simple_loss=0.07193, pruned_loss=0.01289, audio_tagging_loss=0.008213, over 15532.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08838, pruned_loss=0.01174, audio_tagging_loss=0.008441, over 3045801.42 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:07:44,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587900 2023-11-29 10:07:51,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2023-11-29 10:07:58,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3919386.6666666665, ans=0.0 2023-11-29 10:08:00,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2023-11-29 10:08:22,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-11-29 10:08:29,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2023-11-29 10:08:31,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3919520.0, ans=0.125 2023-11-29 10:08:39,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3919586.6666666665, ans=0.1 2023-11-29 10:08:44,882 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10800, loss[loss=0.08882, simple_loss=0.1295, pruned_loss=0.01876, audio_tagging_loss=0.005331, over 16080.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.0889, pruned_loss=0.01184, audio_tagging_loss=0.008408, over 3045534.27 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:08:44,979 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 587950 2023-11-29 10:08:58,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2023-11-29 10:09:39,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 9.085e+01 9.620e+01 1.015e+02 1.229e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:09:47,016 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10850, loss[loss=0.04509, simple_loss=0.05792, pruned_loss=0.006226, audio_tagging_loss=0.009906, over 16541.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08845, pruned_loss=0.01164, audio_tagging_loss=0.008505, over 3044662.77 frames. ], batch size: 65, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:09:47,106 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588000 2023-11-29 10:09:48,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3919986.6666666665, ans=0.125 2023-11-29 10:09:57,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3919986.6666666665, ans=0.125 2023-11-29 10:10:01,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3920053.3333333335, ans=0.125 2023-11-29 10:10:02,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3920053.3333333335, ans=0.1 2023-11-29 10:10:08,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3920053.3333333335, ans=0.125 2023-11-29 10:10:48,967 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:10:51,865 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10900, loss[loss=0.06131, simple_loss=0.0801, pruned_loss=0.0117, audio_tagging_loss=0.009566, over 14609.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.0881, pruned_loss=0.0116, audio_tagging_loss=0.008673, over 3040189.39 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:10:51,960 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588050 2023-11-29 10:10:56,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3920320.0, ans=10.0 2023-11-29 10:11:06,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3920386.6666666665, ans=0.1 2023-11-29 10:11:07,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3920386.6666666665, ans=0.125 2023-11-29 10:11:09,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3920386.6666666665, ans=0.1 2023-11-29 10:11:11,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3920386.6666666665, ans=0.125 2023-11-29 10:11:15,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3920453.3333333335, ans=0.0 2023-11-29 10:11:40,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3920586.6666666665, ans=0.2 2023-11-29 10:11:41,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3920586.6666666665, ans=0.125 2023-11-29 10:11:45,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.06 vs. limit=10.0 2023-11-29 10:11:47,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.221e+01 9.905e+01 1.064e+02 1.744e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 10:11:52,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2023-11-29 10:11:53,348 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 10950, loss[loss=0.05638, simple_loss=0.07648, pruned_loss=0.008562, audio_tagging_loss=0.00958, over 16806.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08773, pruned_loss=0.01148, audio_tagging_loss=0.008688, over 3042707.19 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:11:53,428 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588100 2023-11-29 10:11:57,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3920653.3333333335, ans=10.0 2023-11-29 10:12:14,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3920720.0, ans=0.1 2023-11-29 10:12:14,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3920720.0, ans=0.1 2023-11-29 10:12:16,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3920786.6666666665, ans=0.125 2023-11-29 10:12:41,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3920853.3333333335, ans=0.125 2023-11-29 10:12:50,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3920920.0, ans=0.0 2023-11-29 10:12:54,874 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11000, loss[loss=0.05015, simple_loss=0.06884, pruned_loss=0.00864, audio_tagging_loss=0.007094, over 14773.00 frames. ], tot_loss[loss=0.06358, simple_loss=0.08714, pruned_loss=0.01135, audio_tagging_loss=0.008658, over 3045719.12 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:12:54,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588150 2023-11-29 10:12:56,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3920986.6666666665, ans=0.125 2023-11-29 10:12:57,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=22.5 2023-11-29 10:13:01,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3920986.6666666665, ans=0.0 2023-11-29 10:13:06,266 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:13:30,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3921120.0, ans=0.0 2023-11-29 10:13:32,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-29 10:13:50,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.054e+01 9.771e+01 1.053e+02 1.277e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 10:13:54,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3921253.3333333335, ans=0.025 2023-11-29 10:13:56,550 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11050, loss[loss=0.05529, simple_loss=0.0736, pruned_loss=0.008092, audio_tagging_loss=0.0104, over 16454.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08899, pruned_loss=0.0117, audio_tagging_loss=0.008604, over 3060287.82 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:13:56,639 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588200 2023-11-29 10:14:19,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3921386.6666666665, ans=0.09899494936611666 2023-11-29 10:14:45,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3921586.6666666665, ans=0.2 2023-11-29 10:14:58,978 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11100, loss[loss=0.06175, simple_loss=0.09413, pruned_loss=0.006344, audio_tagging_loss=0.008345, over 14847.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08848, pruned_loss=0.01181, audio_tagging_loss=0.008812, over 3058465.91 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:14:59,075 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588250 2023-11-29 10:15:02,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3921653.3333333335, ans=0.125 2023-11-29 10:15:23,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3921786.6666666665, ans=0.125 2023-11-29 10:15:53,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.126e+01 9.732e+01 1.035e+02 1.401e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 10:15:59,062 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11150, loss[loss=0.08851, simple_loss=0.1165, pruned_loss=0.02002, audio_tagging_loss=0.01022, over 15501.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08876, pruned_loss=0.01186, audio_tagging_loss=0.008864, over 3052816.89 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:15:59,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588300 2023-11-29 10:16:40,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3922186.6666666665, ans=0.0 2023-11-29 10:16:48,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.11 vs. limit=22.5 2023-11-29 10:16:53,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-29 10:16:58,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-29 10:17:00,558 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11200, loss[loss=0.05464, simple_loss=0.07265, pruned_loss=0.008847, audio_tagging_loss=0.00946, over 16226.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08757, pruned_loss=0.01157, audio_tagging_loss=0.009047, over 3058163.56 frames. ], batch size: 62, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:17:00,661 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588350 2023-11-29 10:17:14,114 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:17:20,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3922386.6666666665, ans=0.07 2023-11-29 10:17:46,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3922520.0, ans=0.125 2023-11-29 10:17:56,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 9.269e+01 9.742e+01 1.059e+02 1.357e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 10:18:02,839 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11250, loss[loss=0.0388, simple_loss=0.04676, pruned_loss=0.006751, audio_tagging_loss=0.008675, over 14790.00 frames. ], tot_loss[loss=0.0637, simple_loss=0.08648, pruned_loss=0.0115, audio_tagging_loss=0.008952, over 3050394.15 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:18:02,954 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588400 2023-11-29 10:18:45,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3922853.3333333335, ans=0.09899494936611666 2023-11-29 10:18:56,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3922920.0, ans=0.5 2023-11-29 10:19:01,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3922920.0, ans=0.125 2023-11-29 10:19:01,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3922920.0, ans=0.0 2023-11-29 10:19:03,741 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11300, loss[loss=0.05874, simple_loss=0.08149, pruned_loss=0.01097, audio_tagging_loss=0.007023, over 15300.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08766, pruned_loss=0.01165, audio_tagging_loss=0.008785, over 3049594.31 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:19:03,829 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588450 2023-11-29 10:19:09,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3922986.6666666665, ans=0.125 2023-11-29 10:19:16,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-29 10:19:17,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3923053.3333333335, ans=0.5 2023-11-29 10:19:20,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3923053.3333333335, ans=0.125 2023-11-29 10:19:34,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3923120.0, ans=0.0 2023-11-29 10:19:49,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3923186.6666666665, ans=0.125 2023-11-29 10:19:58,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 9.209e+01 9.911e+01 1.088e+02 1.767e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 10:20:04,402 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11350, loss[loss=0.07639, simple_loss=0.1103, pruned_loss=0.01339, audio_tagging_loss=0.007834, over 14986.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08774, pruned_loss=0.01179, audio_tagging_loss=0.00864, over 3046202.26 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:20:04,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588500 2023-11-29 10:20:21,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-29 10:20:22,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3923386.6666666665, ans=0.125 2023-11-29 10:20:34,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3923453.3333333335, ans=0.0 2023-11-29 10:20:49,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3923520.0, ans=0.125 2023-11-29 10:20:50,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3923520.0, ans=0.0 2023-11-29 10:20:52,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3923586.6666666665, ans=0.125 2023-11-29 10:20:54,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3923586.6666666665, ans=0.125 2023-11-29 10:21:06,488 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11400, loss[loss=0.04105, simple_loss=0.05815, pruned_loss=0.004841, audio_tagging_loss=0.007141, over 14525.00 frames. ], tot_loss[loss=0.06408, simple_loss=0.08777, pruned_loss=0.01166, audio_tagging_loss=0.008533, over 3044072.18 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:21:06,566 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588550 2023-11-29 10:21:22,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3923720.0, ans=0.07 2023-11-29 10:22:01,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.974e+01 9.585e+01 1.035e+02 2.029e+02, threshold=1.917e+02, percent-clipped=1.0 2023-11-29 10:22:07,788 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11450, loss[loss=0.07709, simple_loss=0.1075, pruned_loss=0.01593, audio_tagging_loss=0.007432, over 15766.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08802, pruned_loss=0.0118, audio_tagging_loss=0.00857, over 3043214.14 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:22:07,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588600 2023-11-29 10:22:08,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3923986.6666666665, ans=0.125 2023-11-29 10:22:26,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3924053.3333333335, ans=0.09899494936611666 2023-11-29 10:22:39,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3924120.0, ans=0.0 2023-11-29 10:23:09,783 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11500, loss[loss=0.07177, simple_loss=0.0948, pruned_loss=0.01513, audio_tagging_loss=0.009251, over 14683.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08765, pruned_loss=0.01162, audio_tagging_loss=0.008571, over 3039461.18 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:23:09,863 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588650 2023-11-29 10:23:17,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3924320.0, ans=0.125 2023-11-29 10:23:27,612 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:23:33,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3924453.3333333335, ans=0.125 2023-11-29 10:23:51,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3924520.0, ans=0.1 2023-11-29 10:23:54,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3924520.0, ans=0.125 2023-11-29 10:24:01,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3924586.6666666665, ans=0.1 2023-11-29 10:24:06,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.997e+01 9.583e+01 1.037e+02 1.357e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:24:08,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3924586.6666666665, ans=0.125 2023-11-29 10:24:09,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3924586.6666666665, ans=0.0 2023-11-29 10:24:10,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3924653.3333333335, ans=0.125 2023-11-29 10:24:11,585 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11550, loss[loss=0.07753, simple_loss=0.1028, pruned_loss=0.01738, audio_tagging_loss=0.008734, over 14238.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08831, pruned_loss=0.01171, audio_tagging_loss=0.008492, over 3040761.43 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:24:11,669 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588700 2023-11-29 10:24:19,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-29 10:24:28,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3924720.0, ans=0.125 2023-11-29 10:24:37,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3924786.6666666665, ans=0.125 2023-11-29 10:24:40,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.45 vs. limit=15.0 2023-11-29 10:24:49,075 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:24:59,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3924920.0, ans=0.05 2023-11-29 10:25:00,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3924920.0, ans=0.125 2023-11-29 10:25:12,267 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11600, loss[loss=0.03416, simple_loss=0.0444, pruned_loss=0.00382, audio_tagging_loss=0.008137, over 16296.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08857, pruned_loss=0.01169, audio_tagging_loss=0.008518, over 3046709.35 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:25:12,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588750 2023-11-29 10:25:24,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=15.0 2023-11-29 10:25:51,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3925186.6666666665, ans=0.0 2023-11-29 10:25:52,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 2023-11-29 10:26:00,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3925253.3333333335, ans=0.0 2023-11-29 10:26:09,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.110e+01 9.153e+01 9.919e+01 1.070e+02 2.477e+02, threshold=1.984e+02, percent-clipped=1.0 2023-11-29 10:26:14,590 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11650, loss[loss=0.0439, simple_loss=0.06021, pruned_loss=0.004474, audio_tagging_loss=0.009319, over 16226.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08862, pruned_loss=0.01172, audio_tagging_loss=0.008474, over 3046877.40 frames. ], batch size: 64, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:26:14,683 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588800 2023-11-29 10:26:32,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3925386.6666666665, ans=10.0 2023-11-29 10:26:41,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3925453.3333333335, ans=0.125 2023-11-29 10:26:44,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3925453.3333333335, ans=0.0 2023-11-29 10:27:09,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3925586.6666666665, ans=10.0 2023-11-29 10:27:16,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3925653.3333333335, ans=0.0 2023-11-29 10:27:17,049 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11700, loss[loss=0.05099, simple_loss=0.06521, pruned_loss=0.007573, audio_tagging_loss=0.01081, over 15249.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.0883, pruned_loss=0.01158, audio_tagging_loss=0.008486, over 3048002.20 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:27:17,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588850 2023-11-29 10:27:32,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3925720.0, ans=0.125 2023-11-29 10:27:32,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3925720.0, ans=0.0 2023-11-29 10:27:32,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3925720.0, ans=0.07 2023-11-29 10:28:14,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.126e+01 9.606e+01 1.033e+02 1.375e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:28:17,890 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11750, loss[loss=0.06027, simple_loss=0.08048, pruned_loss=0.0113, audio_tagging_loss=0.008727, over 16197.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08875, pruned_loss=0.01159, audio_tagging_loss=0.008404, over 3052636.36 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:28:17,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588900 2023-11-29 10:28:30,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3926053.3333333335, ans=0.025 2023-11-29 10:29:08,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3926253.3333333335, ans=0.125 2023-11-29 10:29:12,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3926253.3333333335, ans=0.0 2023-11-29 10:29:20,704 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11800, loss[loss=0.07542, simple_loss=0.11, pruned_loss=0.01198, audio_tagging_loss=0.008455, over 14958.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.0883, pruned_loss=0.01151, audio_tagging_loss=0.008456, over 3052685.41 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:29:20,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 588950 2023-11-29 10:29:29,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2023-11-29 10:29:52,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3926453.3333333335, ans=0.125 2023-11-29 10:29:57,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-29 10:30:09,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3926586.6666666665, ans=0.0 2023-11-29 10:30:18,162 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.043e+01 9.603e+01 1.049e+02 1.292e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:30:20,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3926653.3333333335, ans=0.0 2023-11-29 10:30:21,750 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11850, loss[loss=0.06493, simple_loss=0.09188, pruned_loss=0.01152, audio_tagging_loss=0.007473, over 14692.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08749, pruned_loss=0.01155, audio_tagging_loss=0.008571, over 3044432.21 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:30:21,854 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589000 2023-11-29 10:30:28,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3926653.3333333335, ans=0.0 2023-11-29 10:30:35,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3926720.0, ans=0.0 2023-11-29 10:30:48,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3926786.6666666665, ans=0.2 2023-11-29 10:30:49,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2023-11-29 10:30:53,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-29 10:31:12,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3926920.0, ans=0.125 2023-11-29 10:31:18,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-29 10:31:20,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3926920.0, ans=0.0 2023-11-29 10:31:20,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3926920.0, ans=0.0 2023-11-29 10:31:22,940 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11900, loss[loss=0.0525, simple_loss=0.07175, pruned_loss=0.008662, audio_tagging_loss=0.007959, over 14831.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08828, pruned_loss=0.01146, audio_tagging_loss=0.008714, over 3048105.25 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:31:23,013 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589050 2023-11-29 10:31:30,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3926986.6666666665, ans=0.125 2023-11-29 10:31:31,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3926986.6666666665, ans=0.0 2023-11-29 10:31:32,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3926986.6666666665, ans=0.125 2023-11-29 10:31:54,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3927120.0, ans=0.0 2023-11-29 10:32:18,722 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.048e+01 9.528e+01 1.019e+02 1.404e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 10:32:22,293 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 11950, loss[loss=0.06457, simple_loss=0.08969, pruned_loss=0.01238, audio_tagging_loss=0.007341, over 15785.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08806, pruned_loss=0.01162, audio_tagging_loss=0.008852, over 3042843.42 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:32:22,401 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589100 2023-11-29 10:32:35,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3927386.6666666665, ans=0.125 2023-11-29 10:32:39,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3927386.6666666665, ans=0.125 2023-11-29 10:32:40,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3927386.6666666665, ans=0.1 2023-11-29 10:32:40,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3927386.6666666665, ans=0.125 2023-11-29 10:32:49,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3927453.3333333335, ans=0.2 2023-11-29 10:33:08,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3927520.0, ans=0.1 2023-11-29 10:33:12,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3927586.6666666665, ans=0.125 2023-11-29 10:33:20,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3927653.3333333335, ans=0.2 2023-11-29 10:33:21,416 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 12000, loss[loss=0.06722, simple_loss=0.09821, pruned_loss=0.01023, audio_tagging_loss=0.007882, over 14804.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08844, pruned_loss=0.01158, audio_tagging_loss=0.008934, over 3046459.16 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:33:21,417 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 10:34:01,191 INFO [train_asr.py:1267] (3/4) Epoch 49, validation: loss=0.0581, simple_loss=0.05045, pruned_loss=0.005444, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-29 10:34:01,191 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 10:34:01,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589150 2023-11-29 10:34:09,058 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:34:11,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3927720.0, ans=0.1 2023-11-29 10:34:46,419 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 0, loss[loss=0.0783, simple_loss=0.1004, pruned_loss=0.009748, audio_tagging_loss=0.01833, over 14210.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.1004, pruned_loss=0.009748, audio_tagging_loss=0.01833, over 14210.00 frames. ], batch size: 53, lr: 1.36e-03, grad_scale: 32.0 2023-11-29 10:34:46,420 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 10:35:02,707 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7276, 4.3087, 4.7716, 4.3162], device='cuda:3') 2023-11-29 10:35:22,076 INFO [train_asr.py:1267] (3/4) Epoch 50, validation: loss=0.05785, simple_loss=0.05049, pruned_loss=0.005519, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-29 10:35:22,077 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 10:35:46,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3927940.0, ans=0.0 2023-11-29 10:35:51,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-11-29 10:35:52,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3927940.0, ans=0.0 2023-11-29 10:35:54,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 9.469e+01 1.029e+02 1.110e+02 1.447e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-29 10:35:57,200 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589200 2023-11-29 10:35:57,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3927940.0, ans=0.0 2023-11-29 10:36:09,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3928006.6666666665, ans=0.125 2023-11-29 10:36:25,858 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 50, loss[loss=0.07557, simple_loss=0.09421, pruned_loss=0.01236, audio_tagging_loss=0.0161, over 15230.00 frames. ], tot_loss[loss=0.07235, simple_loss=0.08762, pruned_loss=0.01174, audio_tagging_loss=0.01681, over 691803.36 frames. ], batch size: 55, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:36:51,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3928273.3333333335, ans=0.125 2023-11-29 10:36:59,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589250 2023-11-29 10:37:10,518 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:37:13,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3928340.0, ans=0.1 2023-11-29 10:37:29,730 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 100, loss[loss=0.07226, simple_loss=0.1029, pruned_loss=0.01126, audio_tagging_loss=0.009532, over 15393.00 frames. ], tot_loss[loss=0.07145, simple_loss=0.08888, pruned_loss=0.01133, audio_tagging_loss=0.01568, over 1209394.39 frames. ], batch size: 57, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:37:30,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3928473.3333333335, ans=0.0 2023-11-29 10:37:40,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3928540.0, ans=0.125 2023-11-29 10:37:43,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3928540.0, ans=0.125 2023-11-29 10:37:56,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3928606.6666666665, ans=0.125 2023-11-29 10:38:00,143 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.044e+01 1.010e+02 1.060e+02 1.133e+02 1.839e+02, threshold=2.120e+02, percent-clipped=0.0 2023-11-29 10:38:02,574 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589300 2023-11-29 10:38:02,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3928606.6666666665, ans=0.2 2023-11-29 10:38:07,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3928673.3333333335, ans=0.125 2023-11-29 10:38:08,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3928673.3333333335, ans=0.09899494936611666 2023-11-29 10:38:24,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3928740.0, ans=0.125 2023-11-29 10:38:27,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3928740.0, ans=0.0 2023-11-29 10:38:31,786 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 150, loss[loss=0.0726, simple_loss=0.09745, pruned_loss=0.01297, audio_tagging_loss=0.0109, over 15624.00 frames. ], tot_loss[loss=0.06959, simple_loss=0.08812, pruned_loss=0.0113, audio_tagging_loss=0.01424, over 1614749.02 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:38:39,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3928806.6666666665, ans=0.0 2023-11-29 10:38:44,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-29 10:38:49,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2023-11-29 10:38:53,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3928873.3333333335, ans=0.2 2023-11-29 10:39:05,935 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589350 2023-11-29 10:39:31,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3929073.3333333335, ans=0.1 2023-11-29 10:39:34,092 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 200, loss[loss=0.07276, simple_loss=0.1048, pruned_loss=0.01218, audio_tagging_loss=0.008197, over 15934.00 frames. ], tot_loss[loss=0.06891, simple_loss=0.08935, pruned_loss=0.01158, audio_tagging_loss=0.01265, over 1929172.14 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:39:35,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3929140.0, ans=0.0 2023-11-29 10:39:49,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3929206.6666666665, ans=0.125 2023-11-29 10:39:53,531 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:40:01,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-29 10:40:05,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 9.198e+01 9.931e+01 1.061e+02 1.225e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 10:40:07,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589400 2023-11-29 10:40:14,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3929340.0, ans=0.0 2023-11-29 10:40:36,884 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 250, loss[loss=0.06898, simple_loss=0.09875, pruned_loss=0.01361, audio_tagging_loss=0.005989, over 16531.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.0891, pruned_loss=0.01166, audio_tagging_loss=0.01128, over 2184063.84 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:40:55,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3929540.0, ans=0.125 2023-11-29 10:40:56,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3929540.0, ans=0.125 2023-11-29 10:40:59,893 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:41:10,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3929606.6666666665, ans=0.0 2023-11-29 10:41:11,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589450 2023-11-29 10:41:19,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3929673.3333333335, ans=0.1 2023-11-29 10:41:20,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2023-11-29 10:41:40,654 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 300, loss[loss=0.08174, simple_loss=0.1202, pruned_loss=0.01482, audio_tagging_loss=0.006812, over 14942.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.08931, pruned_loss=0.01167, audio_tagging_loss=0.01049, over 2380969.31 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:41:44,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3929806.6666666665, ans=0.0 2023-11-29 10:41:45,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3929806.6666666665, ans=0.0 2023-11-29 10:41:58,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-29 10:42:08,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3929940.0, ans=0.0 2023-11-29 10:42:10,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3929940.0, ans=0.125 2023-11-29 10:42:11,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 9.170e+01 9.850e+01 1.054e+02 1.427e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 10:42:13,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589500 2023-11-29 10:42:30,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3930073.3333333335, ans=0.125 2023-11-29 10:42:36,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3930073.3333333335, ans=0.1 2023-11-29 10:42:42,527 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 350, loss[loss=0.06577, simple_loss=0.09385, pruned_loss=0.01109, audio_tagging_loss=0.007752, over 15612.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08873, pruned_loss=0.01147, audio_tagging_loss=0.009974, over 2533739.42 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:42:50,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2023-11-29 10:43:00,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3930206.6666666665, ans=0.125 2023-11-29 10:43:16,877 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589550 2023-11-29 10:43:19,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2023-11-29 10:43:20,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3930340.0, ans=0.0 2023-11-29 10:43:30,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3930340.0, ans=0.0 2023-11-29 10:43:44,367 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 400, loss[loss=0.04005, simple_loss=0.0468, pruned_loss=0.005392, audio_tagging_loss=0.01125, over 15209.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08859, pruned_loss=0.01136, audio_tagging_loss=0.009651, over 2643273.00 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:44:10,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3930606.6666666665, ans=0.1 2023-11-29 10:44:16,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.147e+01 9.646e+01 1.038e+02 1.524e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 10:44:18,432 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589600 2023-11-29 10:44:21,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=12.0 2023-11-29 10:44:47,884 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 450, loss[loss=0.06151, simple_loss=0.08028, pruned_loss=0.01063, audio_tagging_loss=0.01074, over 15172.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08797, pruned_loss=0.01147, audio_tagging_loss=0.009444, over 2732400.37 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:44:55,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3930806.6666666665, ans=0.0 2023-11-29 10:45:20,577 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589650 2023-11-29 10:45:48,721 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 500, loss[loss=0.06601, simple_loss=0.09352, pruned_loss=0.009819, audio_tagging_loss=0.00943, over 15806.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08729, pruned_loss=0.01141, audio_tagging_loss=0.009202, over 2807706.01 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:45:48,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3931140.0, ans=0.125 2023-11-29 10:46:21,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 9.167e+01 9.711e+01 1.057e+02 1.221e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 10:46:23,047 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589700 2023-11-29 10:46:50,329 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 550, loss[loss=0.07912, simple_loss=0.1108, pruned_loss=0.01804, audio_tagging_loss=0.005666, over 14977.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08717, pruned_loss=0.01147, audio_tagging_loss=0.009008, over 2858204.14 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:46:50,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3931473.3333333335, ans=0.125 2023-11-29 10:46:57,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3931473.3333333335, ans=0.0 2023-11-29 10:47:23,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589750 2023-11-29 10:47:52,394 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 600, loss[loss=0.06042, simple_loss=0.08534, pruned_loss=0.009433, audio_tagging_loss=0.008316, over 14632.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08835, pruned_loss=0.01172, audio_tagging_loss=0.008812, over 2908211.43 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:47:53,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3931806.6666666665, ans=0.1 2023-11-29 10:47:57,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3931806.6666666665, ans=0.0 2023-11-29 10:48:08,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3931873.3333333335, ans=0.125 2023-11-29 10:48:11,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3931873.3333333335, ans=0.125 2023-11-29 10:48:22,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2023-11-29 10:48:24,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.984e+01 9.771e+01 1.059e+02 2.081e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 10:48:25,488 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589800 2023-11-29 10:48:43,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2023-11-29 10:48:54,143 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 650, loss[loss=0.07658, simple_loss=0.1088, pruned_loss=0.01572, audio_tagging_loss=0.006456, over 15497.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08818, pruned_loss=0.01171, audio_tagging_loss=0.008838, over 2943986.69 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:48:54,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3932140.0, ans=0.0 2023-11-29 10:49:01,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3932140.0, ans=0.0 2023-11-29 10:49:10,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.20 vs. limit=10.0 2023-11-29 10:49:16,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-29 10:49:27,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589850 2023-11-29 10:49:30,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-29 10:49:33,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3932340.0, ans=0.0 2023-11-29 10:49:52,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3932406.6666666665, ans=0.0 2023-11-29 10:49:55,247 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 700, loss[loss=0.0612, simple_loss=0.08452, pruned_loss=0.00915, audio_tagging_loss=0.009788, over 15237.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08931, pruned_loss=0.01185, audio_tagging_loss=0.008706, over 2969883.79 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:49:58,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-11-29 10:50:06,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3932473.3333333335, ans=0.1 2023-11-29 10:50:06,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.53 vs. limit=10.0 2023-11-29 10:50:27,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 9.061e+01 9.779e+01 1.049e+02 1.414e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 10:50:28,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589900 2023-11-29 10:50:32,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3932673.3333333335, ans=0.025 2023-11-29 10:50:39,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3932673.3333333335, ans=0.125 2023-11-29 10:50:48,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-29 10:50:49,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-29 10:50:56,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3932806.6666666665, ans=0.09899494936611666 2023-11-29 10:50:57,730 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 750, loss[loss=0.06396, simple_loss=0.08103, pruned_loss=0.01378, audio_tagging_loss=0.009669, over 15857.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09033, pruned_loss=0.01189, audio_tagging_loss=0.008654, over 2995166.30 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:51:01,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3932806.6666666665, ans=0.1 2023-11-29 10:51:08,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3932873.3333333335, ans=0.0 2023-11-29 10:51:31,096 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 589950 2023-11-29 10:51:35,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3933006.6666666665, ans=0.0 2023-11-29 10:51:47,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=12.0 2023-11-29 10:51:53,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3933073.3333333335, ans=0.1 2023-11-29 10:51:59,217 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 800, loss[loss=0.06758, simple_loss=0.09492, pruned_loss=0.01015, audio_tagging_loss=0.009975, over 16135.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09004, pruned_loss=0.01191, audio_tagging_loss=0.008761, over 3009607.64 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:52:05,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3933140.0, ans=0.125 2023-11-29 10:52:11,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3933206.6666666665, ans=0.0 2023-11-29 10:52:13,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3933206.6666666665, ans=0.125 2023-11-29 10:52:25,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-29 10:52:32,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 9.309e+01 9.917e+01 1.087e+02 1.372e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 10:52:33,553 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590000 2023-11-29 10:52:49,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.73 vs. limit=10.0 2023-11-29 10:52:57,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3933406.6666666665, ans=0.125 2023-11-29 10:53:01,573 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 850, loss[loss=0.05789, simple_loss=0.07006, pruned_loss=0.01162, audio_tagging_loss=0.01123, over 15818.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09041, pruned_loss=0.01193, audio_tagging_loss=0.008769, over 3020537.32 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:53:14,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3933540.0, ans=0.0 2023-11-29 10:53:15,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3933540.0, ans=0.0 2023-11-29 10:53:35,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590050 2023-11-29 10:53:43,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3933673.3333333335, ans=0.125 2023-11-29 10:54:05,625 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 900, loss[loss=0.07142, simple_loss=0.08819, pruned_loss=0.01541, audio_tagging_loss=0.01192, over 15126.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08944, pruned_loss=0.01183, audio_tagging_loss=0.008891, over 3024842.15 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:54:10,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3933806.6666666665, ans=0.125 2023-11-29 10:54:36,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3933940.0, ans=0.2 2023-11-29 10:54:37,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.157e+01 9.744e+01 1.021e+02 1.316e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 10:54:39,047 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590100 2023-11-29 10:54:59,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3934073.3333333335, ans=0.125 2023-11-29 10:55:01,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3934073.3333333335, ans=0.125 2023-11-29 10:55:07,088 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 950, loss[loss=0.0668, simple_loss=0.09158, pruned_loss=0.01506, audio_tagging_loss=0.005952, over 14739.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08938, pruned_loss=0.01182, audio_tagging_loss=0.008787, over 3026088.81 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:55:13,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3934140.0, ans=0.125 2023-11-29 10:55:40,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3934273.3333333335, ans=0.125 2023-11-29 10:55:41,612 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590150 2023-11-29 10:55:45,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3934340.0, ans=0.125 2023-11-29 10:55:45,347 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:56:09,331 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1000, loss[loss=0.06978, simple_loss=0.09986, pruned_loss=0.0132, audio_tagging_loss=0.006652, over 15321.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08983, pruned_loss=0.01194, audio_tagging_loss=0.008597, over 3028856.38 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:56:37,483 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:56:40,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 9.201e+01 9.754e+01 1.071e+02 1.435e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 10:56:42,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590200 2023-11-29 10:56:49,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=15.0 2023-11-29 10:57:12,242 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1050, loss[loss=0.0747, simple_loss=0.1054, pruned_loss=0.01501, audio_tagging_loss=0.007007, over 15833.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08944, pruned_loss=0.01199, audio_tagging_loss=0.008477, over 3035298.62 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:57:16,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3934806.6666666665, ans=0.09899494936611666 2023-11-29 10:57:18,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3934806.6666666665, ans=0.0 2023-11-29 10:57:33,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3934873.3333333335, ans=0.125 2023-11-29 10:57:34,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=22.5 2023-11-29 10:57:34,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3934940.0, ans=0.125 2023-11-29 10:57:43,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2023-11-29 10:57:45,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590250 2023-11-29 10:57:46,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3934940.0, ans=0.2 2023-11-29 10:58:13,863 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1100, loss[loss=0.04756, simple_loss=0.05872, pruned_loss=0.008498, audio_tagging_loss=0.009697, over 14854.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08861, pruned_loss=0.0119, audio_tagging_loss=0.008406, over 3041577.82 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:58:19,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-29 10:58:19,951 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:58:21,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3935140.0, ans=0.025 2023-11-29 10:58:23,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3935140.0, ans=0.0 2023-11-29 10:58:26,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3935206.6666666665, ans=15.0 2023-11-29 10:58:31,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.55 vs. limit=6.0 2023-11-29 10:58:48,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.269e+01 9.621e+01 1.031e+02 1.312e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:58:48,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590300 2023-11-29 10:58:57,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3935340.0, ans=0.125 2023-11-29 10:59:06,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3935406.6666666665, ans=0.0 2023-11-29 10:59:08,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3935406.6666666665, ans=0.125 2023-11-29 10:59:14,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3935473.3333333335, ans=0.035 2023-11-29 10:59:16,211 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1150, loss[loss=0.06286, simple_loss=0.08984, pruned_loss=0.01335, audio_tagging_loss=0.00459, over 14815.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08908, pruned_loss=0.01185, audio_tagging_loss=0.008262, over 3041157.80 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:59:40,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-29 10:59:47,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3935606.6666666665, ans=0.2 2023-11-29 10:59:50,091 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590350 2023-11-29 11:00:08,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-29 11:00:18,779 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1200, loss[loss=0.06224, simple_loss=0.0845, pruned_loss=0.01179, audio_tagging_loss=0.008196, over 14198.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08857, pruned_loss=0.01179, audio_tagging_loss=0.008283, over 3036648.66 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:00:24,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3935806.6666666665, ans=0.125 2023-11-29 11:00:26,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-29 11:00:27,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-29 11:00:28,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2023-11-29 11:00:33,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3935873.3333333335, ans=0.0 2023-11-29 11:00:45,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-29 11:00:51,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590400 2023-11-29 11:00:52,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.086e+01 9.906e+01 1.090e+02 1.794e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:00:55,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3936006.6666666665, ans=0.07 2023-11-29 11:01:08,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3936073.3333333335, ans=0.0 2023-11-29 11:01:11,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3936073.3333333335, ans=15.0 2023-11-29 11:01:21,003 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1250, loss[loss=0.05336, simple_loss=0.06996, pruned_loss=0.008092, audio_tagging_loss=0.01029, over 15818.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08961, pruned_loss=0.01186, audio_tagging_loss=0.008226, over 3041507.53 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:01:23,752 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:01:26,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2023-11-29 11:01:50,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3936273.3333333335, ans=0.2 2023-11-29 11:01:51,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3936273.3333333335, ans=0.2 2023-11-29 11:01:55,033 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590450 2023-11-29 11:02:22,018 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1300, loss[loss=0.06156, simple_loss=0.09437, pruned_loss=0.008116, audio_tagging_loss=0.006263, over 14870.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08925, pruned_loss=0.0119, audio_tagging_loss=0.008249, over 3043420.72 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:02:42,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3936540.0, ans=0.125 2023-11-29 11:02:46,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3936606.6666666665, ans=0.125 2023-11-29 11:02:50,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3936606.6666666665, ans=0.95 2023-11-29 11:02:55,319 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590500 2023-11-29 11:02:56,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.938e+01 9.408e+01 1.020e+02 1.519e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 11:03:02,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3936673.3333333335, ans=0.0 2023-11-29 11:03:23,212 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1350, loss[loss=0.06577, simple_loss=0.08989, pruned_loss=0.01328, audio_tagging_loss=0.00755, over 14922.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08939, pruned_loss=0.01188, audio_tagging_loss=0.008273, over 3049521.06 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:03:54,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2023-11-29 11:03:56,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590550 2023-11-29 11:03:56,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3936940.0, ans=0.0 2023-11-29 11:04:02,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3937006.6666666665, ans=0.125 2023-11-29 11:04:11,544 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:04:19,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-29 11:04:25,682 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1400, loss[loss=0.05401, simple_loss=0.07796, pruned_loss=0.008698, audio_tagging_loss=0.006331, over 15053.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.089, pruned_loss=0.01183, audio_tagging_loss=0.008339, over 3053504.16 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:04:37,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3937206.6666666665, ans=0.125 2023-11-29 11:04:56,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3937273.3333333335, ans=0.125 2023-11-29 11:04:58,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590600 2023-11-29 11:04:59,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.992e+01 9.669e+01 1.038e+02 1.341e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 11:05:09,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3937340.0, ans=0.125 2023-11-29 11:05:11,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2023-11-29 11:05:26,976 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1450, loss[loss=0.05087, simple_loss=0.06828, pruned_loss=0.006294, audio_tagging_loss=0.01043, over 15610.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08842, pruned_loss=0.01173, audio_tagging_loss=0.008517, over 3054747.79 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:05:32,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3937473.3333333335, ans=0.125 2023-11-29 11:05:37,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3937473.3333333335, ans=0.125 2023-11-29 11:05:53,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3937606.6666666665, ans=0.125 2023-11-29 11:05:58,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.67 vs. limit=6.0 2023-11-29 11:06:01,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590650 2023-11-29 11:06:05,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3937673.3333333335, ans=10.0 2023-11-29 11:06:28,792 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1500, loss[loss=0.07333, simple_loss=0.1098, pruned_loss=0.01248, audio_tagging_loss=0.005953, over 15240.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08909, pruned_loss=0.01181, audio_tagging_loss=0.008536, over 3053040.22 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:06:30,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3937806.6666666665, ans=0.0 2023-11-29 11:06:38,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3937806.6666666665, ans=0.1 2023-11-29 11:06:46,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3937873.3333333335, ans=0.07 2023-11-29 11:07:02,006 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590700 2023-11-29 11:07:03,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 9.150e+01 9.903e+01 1.059e+02 1.485e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:07:19,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3938073.3333333335, ans=0.125 2023-11-29 11:07:24,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3938073.3333333335, ans=0.5 2023-11-29 11:07:31,373 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1550, loss[loss=0.05584, simple_loss=0.07429, pruned_loss=0.009607, audio_tagging_loss=0.009089, over 14431.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0888, pruned_loss=0.01187, audio_tagging_loss=0.008681, over 3051301.89 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:07:50,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-29 11:08:00,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3938273.3333333335, ans=0.0 2023-11-29 11:08:03,576 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590750 2023-11-29 11:08:22,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3938406.6666666665, ans=0.125 2023-11-29 11:08:32,423 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1600, loss[loss=0.06599, simple_loss=0.07952, pruned_loss=0.01624, audio_tagging_loss=0.009986, over 14213.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08924, pruned_loss=0.01191, audio_tagging_loss=0.008687, over 3053396.84 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:08:51,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3938540.0, ans=0.2 2023-11-29 11:08:55,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3938540.0, ans=0.0 2023-11-29 11:09:06,571 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590800 2023-11-29 11:09:07,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.890e+01 8.907e+01 9.577e+01 1.022e+02 1.784e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 11:09:19,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-29 11:09:34,416 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1650, loss[loss=0.06593, simple_loss=0.09075, pruned_loss=0.01111, audio_tagging_loss=0.009441, over 14275.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08931, pruned_loss=0.01191, audio_tagging_loss=0.008728, over 3050125.05 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:09:34,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3938806.6666666665, ans=0.2 2023-11-29 11:09:45,278 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:09:55,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3938873.3333333335, ans=0.035 2023-11-29 11:09:58,709 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:10:01,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2023-11-29 11:10:08,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590850 2023-11-29 11:10:32,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3939073.3333333335, ans=0.1 2023-11-29 11:10:32,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3939073.3333333335, ans=0.1 2023-11-29 11:10:36,640 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1700, loss[loss=0.07163, simple_loss=0.1064, pruned_loss=0.008291, audio_tagging_loss=0.01016, over 15479.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0902, pruned_loss=0.012, audio_tagging_loss=0.008658, over 3062115.02 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:10:39,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3939140.0, ans=0.1 2023-11-29 11:10:44,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3939140.0, ans=0.125 2023-11-29 11:10:57,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-29 11:11:09,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590900 2023-11-29 11:11:10,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.146e+01 9.599e+01 1.028e+02 1.355e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 11:11:29,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3939406.6666666665, ans=0.125 2023-11-29 11:11:38,439 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1750, loss[loss=0.06653, simple_loss=0.09852, pruned_loss=0.01166, audio_tagging_loss=0.005608, over 15292.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08976, pruned_loss=0.0119, audio_tagging_loss=0.008569, over 3057720.48 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:12:01,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3939540.0, ans=0.07 2023-11-29 11:12:03,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3939606.6666666665, ans=0.125 2023-11-29 11:12:12,004 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 590950 2023-11-29 11:12:24,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3939673.3333333335, ans=0.2 2023-11-29 11:12:39,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3939806.6666666665, ans=10.0 2023-11-29 11:12:40,146 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1800, loss[loss=0.06986, simple_loss=0.1011, pruned_loss=0.01051, audio_tagging_loss=0.008814, over 15461.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0899, pruned_loss=0.01193, audio_tagging_loss=0.008515, over 3058093.51 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:12:48,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3939806.6666666665, ans=0.125 2023-11-29 11:13:00,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3939873.3333333335, ans=0.0 2023-11-29 11:13:13,658 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591000 2023-11-29 11:13:14,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.256e+01 9.797e+01 1.069e+02 1.253e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 11:13:17,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-29 11:13:22,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3940006.6666666665, ans=0.125 2023-11-29 11:13:42,527 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1850, loss[loss=0.07469, simple_loss=0.1069, pruned_loss=0.01396, audio_tagging_loss=0.007299, over 15599.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08886, pruned_loss=0.01174, audio_tagging_loss=0.008507, over 3054235.07 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:14:05,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3940273.3333333335, ans=0.125 2023-11-29 11:14:14,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3940273.3333333335, ans=0.0 2023-11-29 11:14:15,327 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591050 2023-11-29 11:14:43,502 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1900, loss[loss=0.05719, simple_loss=0.0821, pruned_loss=0.01024, audio_tagging_loss=0.005902, over 14944.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08923, pruned_loss=0.01176, audio_tagging_loss=0.008421, over 3052899.11 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:15:17,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-29 11:15:17,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591100 2023-11-29 11:15:19,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.739e+01 9.784e+01 1.081e+02 1.359e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 11:15:28,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2023-11-29 11:15:37,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3940740.0, ans=0.125 2023-11-29 11:15:40,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-29 11:15:45,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3940806.6666666665, ans=0.125 2023-11-29 11:15:46,041 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 1950, loss[loss=0.07016, simple_loss=0.09849, pruned_loss=0.01156, audio_tagging_loss=0.009356, over 15515.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08814, pruned_loss=0.01153, audio_tagging_loss=0.008522, over 3047156.71 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:15:48,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3940806.6666666665, ans=0.125 2023-11-29 11:16:01,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-29 11:16:13,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3940940.0, ans=0.125 2023-11-29 11:16:18,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591150 2023-11-29 11:16:48,012 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2000, loss[loss=0.06942, simple_loss=0.08728, pruned_loss=0.01539, audio_tagging_loss=0.0104, over 15456.00 frames. ], tot_loss[loss=0.06357, simple_loss=0.08727, pruned_loss=0.01139, audio_tagging_loss=0.008538, over 3041086.75 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:16:54,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3941140.0, ans=0.125 2023-11-29 11:16:54,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2023-11-29 11:16:57,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3941140.0, ans=0.1 2023-11-29 11:17:11,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-29 11:17:20,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591200 2023-11-29 11:17:21,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3941273.3333333335, ans=0.125 2023-11-29 11:17:21,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.211e+01 9.826e+01 1.048e+02 3.263e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-29 11:17:24,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3941340.0, ans=0.125 2023-11-29 11:17:46,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3941406.6666666665, ans=0.125 2023-11-29 11:17:49,472 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2050, loss[loss=0.0526, simple_loss=0.06915, pruned_loss=0.01062, audio_tagging_loss=0.007396, over 14102.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.08775, pruned_loss=0.01146, audio_tagging_loss=0.008455, over 3046634.99 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:17:59,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3941473.3333333335, ans=0.125 2023-11-29 11:17:59,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3941473.3333333335, ans=0.0 2023-11-29 11:18:10,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3941540.0, ans=0.125 2023-11-29 11:18:12,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3941540.0, ans=0.125 2023-11-29 11:18:19,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3941606.6666666665, ans=0.025 2023-11-29 11:18:24,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-29 11:18:24,884 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591250 2023-11-29 11:18:53,361 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2100, loss[loss=0.06614, simple_loss=0.08864, pruned_loss=0.009283, audio_tagging_loss=0.01254, over 15721.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08802, pruned_loss=0.01139, audio_tagging_loss=0.008454, over 3052483.33 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:19:01,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3941806.6666666665, ans=0.125 2023-11-29 11:19:12,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3941873.3333333335, ans=0.125 2023-11-29 11:19:26,514 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591300 2023-11-29 11:19:27,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.036e+01 9.652e+01 1.066e+02 1.265e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:19:30,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2023-11-29 11:19:43,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3942073.3333333335, ans=0.1 2023-11-29 11:19:43,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3942073.3333333335, ans=0.125 2023-11-29 11:19:55,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2023-11-29 11:19:55,527 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2150, loss[loss=0.06273, simple_loss=0.08403, pruned_loss=0.01168, audio_tagging_loss=0.009036, over 16496.00 frames. ], tot_loss[loss=0.06383, simple_loss=0.0878, pruned_loss=0.01146, audio_tagging_loss=0.008463, over 3051659.85 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:19:55,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3942140.0, ans=0.125 2023-11-29 11:19:57,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3942140.0, ans=0.1 2023-11-29 11:20:10,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3942206.6666666665, ans=0.0 2023-11-29 11:20:18,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3942273.3333333335, ans=0.125 2023-11-29 11:20:20,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3942273.3333333335, ans=0.125 2023-11-29 11:20:28,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591350 2023-11-29 11:20:33,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3942340.0, ans=0.125 2023-11-29 11:20:34,431 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:20:56,523 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2200, loss[loss=0.07203, simple_loss=0.1056, pruned_loss=0.01298, audio_tagging_loss=0.006253, over 16720.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08846, pruned_loss=0.01155, audio_tagging_loss=0.008498, over 3055988.31 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:21:30,769 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591400 2023-11-29 11:21:33,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.050e+01 9.112e+01 9.556e+01 1.057e+02 1.343e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 11:21:58,269 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2250, loss[loss=0.07535, simple_loss=0.1024, pruned_loss=0.01532, audio_tagging_loss=0.008857, over 16233.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08891, pruned_loss=0.01151, audio_tagging_loss=0.008491, over 3049839.96 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:22:02,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3942806.6666666665, ans=0.125 2023-11-29 11:22:09,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=12.0 2023-11-29 11:22:32,836 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591450 2023-11-29 11:22:54,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3943073.3333333335, ans=0.0 2023-11-29 11:23:00,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3943140.0, ans=0.0 2023-11-29 11:23:01,167 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2300, loss[loss=0.05231, simple_loss=0.06503, pruned_loss=0.008208, audio_tagging_loss=0.01159, over 16081.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08887, pruned_loss=0.01168, audio_tagging_loss=0.008591, over 3053173.07 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:23:06,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3943140.0, ans=0.0 2023-11-29 11:23:33,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591500 2023-11-29 11:23:35,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3943273.3333333335, ans=0.1 2023-11-29 11:23:36,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.045e+01 9.649e+01 1.036e+02 1.193e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:23:38,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3943340.0, ans=0.125 2023-11-29 11:23:40,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=8.0 2023-11-29 11:23:42,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-29 11:23:45,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3943340.0, ans=0.125 2023-11-29 11:23:50,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3943406.6666666665, ans=0.0 2023-11-29 11:23:58,484 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:24:00,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-29 11:24:03,224 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2350, loss[loss=0.07181, simple_loss=0.1078, pruned_loss=0.01188, audio_tagging_loss=0.006025, over 14749.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0888, pruned_loss=0.01163, audio_tagging_loss=0.008641, over 3052601.44 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:24:07,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3943473.3333333335, ans=0.0 2023-11-29 11:24:14,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-11-29 11:24:32,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2023-11-29 11:24:37,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591550 2023-11-29 11:24:42,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3943673.3333333335, ans=0.125 2023-11-29 11:24:43,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-29 11:25:01,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3943740.0, ans=0.2 2023-11-29 11:25:04,299 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2400, loss[loss=0.07714, simple_loss=0.1097, pruned_loss=0.01315, audio_tagging_loss=0.00914, over 16329.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08818, pruned_loss=0.01159, audio_tagging_loss=0.008749, over 3050471.96 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:25:21,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943873.3333333335, ans=0.1 2023-11-29 11:25:24,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3943873.3333333335, ans=0.09899494936611666 2023-11-29 11:25:25,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2023-11-29 11:25:31,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3943940.0, ans=0.0 2023-11-29 11:25:38,204 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591600 2023-11-29 11:25:40,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 9.372e+01 9.981e+01 1.068e+02 1.267e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 11:25:56,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3944073.3333333335, ans=0.125 2023-11-29 11:26:06,091 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2450, loss[loss=0.07855, simple_loss=0.1115, pruned_loss=0.01449, audio_tagging_loss=0.008299, over 14955.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.0877, pruned_loss=0.01146, audio_tagging_loss=0.0088, over 3050420.68 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:26:19,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3944206.6666666665, ans=0.0 2023-11-29 11:26:25,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3944206.6666666665, ans=0.0 2023-11-29 11:26:25,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-29 11:26:34,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3944273.3333333335, ans=0.125 2023-11-29 11:26:39,236 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591650 2023-11-29 11:26:43,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2023-11-29 11:27:05,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3944406.6666666665, ans=0.2 2023-11-29 11:27:07,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3944473.3333333335, ans=0.0 2023-11-29 11:27:08,532 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2500, loss[loss=0.05889, simple_loss=0.08085, pruned_loss=0.01097, audio_tagging_loss=0.007496, over 14606.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08818, pruned_loss=0.01169, audio_tagging_loss=0.008805, over 3051818.07 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:27:09,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3944473.3333333335, ans=0.125 2023-11-29 11:27:10,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3944473.3333333335, ans=0.125 2023-11-29 11:27:10,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-29 11:27:24,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3944540.0, ans=0.125 2023-11-29 11:27:28,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3944540.0, ans=0.0 2023-11-29 11:27:40,753 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591700 2023-11-29 11:27:44,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.155e+01 9.688e+01 1.051e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 11:27:48,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3944673.3333333335, ans=0.2 2023-11-29 11:28:03,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3944740.0, ans=0.0 2023-11-29 11:28:06,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2023-11-29 11:28:08,708 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2550, loss[loss=0.06077, simple_loss=0.08446, pruned_loss=0.01109, audio_tagging_loss=0.007461, over 14885.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08859, pruned_loss=0.01171, audio_tagging_loss=0.008628, over 3053787.24 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:28:15,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-29 11:28:25,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3944873.3333333335, ans=0.125 2023-11-29 11:28:34,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3944940.0, ans=0.125 2023-11-29 11:28:38,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3944940.0, ans=0.0 2023-11-29 11:28:42,507 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591750 2023-11-29 11:28:53,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-29 11:29:06,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3945073.3333333335, ans=0.125 2023-11-29 11:29:08,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3945073.3333333335, ans=0.125 2023-11-29 11:29:10,121 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2600, loss[loss=0.08328, simple_loss=0.1187, pruned_loss=0.01565, audio_tagging_loss=0.008256, over 15229.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08828, pruned_loss=0.01157, audio_tagging_loss=0.008496, over 3055021.45 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:29:26,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2023-11-29 11:29:36,795 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:29:39,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3945273.3333333335, ans=0.125 2023-11-29 11:29:43,980 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591800 2023-11-29 11:29:47,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.895e+01 9.478e+01 1.021e+02 1.360e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 11:30:13,362 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2650, loss[loss=0.05386, simple_loss=0.07432, pruned_loss=0.008426, audio_tagging_loss=0.008276, over 13858.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08821, pruned_loss=0.01145, audio_tagging_loss=0.008408, over 3048529.84 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:30:28,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3945540.0, ans=0.0 2023-11-29 11:30:45,904 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591850 2023-11-29 11:30:51,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3945673.3333333335, ans=0.09899494936611666 2023-11-29 11:31:14,817 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2700, loss[loss=0.05989, simple_loss=0.08265, pruned_loss=0.01029, audio_tagging_loss=0.008267, over 14824.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08843, pruned_loss=0.01157, audio_tagging_loss=0.008401, over 3046631.22 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:31:39,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3945940.0, ans=0.02 2023-11-29 11:31:49,033 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591900 2023-11-29 11:31:49,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3945940.0, ans=0.07 2023-11-29 11:31:50,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3945940.0, ans=0.125 2023-11-29 11:31:53,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 9.168e+01 9.953e+01 1.095e+02 1.462e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 11:31:53,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3946006.6666666665, ans=0.07 2023-11-29 11:32:06,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3946073.3333333335, ans=0.0 2023-11-29 11:32:07,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3946073.3333333335, ans=0.1 2023-11-29 11:32:16,463 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2750, loss[loss=0.0589, simple_loss=0.09013, pruned_loss=0.008673, audio_tagging_loss=0.005163, over 15169.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08834, pruned_loss=0.01166, audio_tagging_loss=0.008405, over 3047159.22 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 11:32:33,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3946206.6666666665, ans=0.0 2023-11-29 11:32:49,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 591950 2023-11-29 11:32:55,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3946340.0, ans=0.125 2023-11-29 11:33:05,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3946406.6666666665, ans=0.125 2023-11-29 11:33:10,000 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:33:14,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3946406.6666666665, ans=0.125 2023-11-29 11:33:18,156 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2800, loss[loss=0.05528, simple_loss=0.07321, pruned_loss=0.009355, audio_tagging_loss=0.009314, over 14882.00 frames. ], tot_loss[loss=0.06381, simple_loss=0.08774, pruned_loss=0.01154, audio_tagging_loss=0.008401, over 3045112.97 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:33:20,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3946473.3333333335, ans=0.0 2023-11-29 11:33:31,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-29 11:33:51,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592000 2023-11-29 11:33:52,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3946606.6666666665, ans=0.0 2023-11-29 11:33:58,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3946673.3333333335, ans=0.2 2023-11-29 11:33:58,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 9.130e+01 9.870e+01 1.066e+02 1.963e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 11:34:02,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2023-11-29 11:34:03,352 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:34:22,998 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2850, loss[loss=0.06998, simple_loss=0.09379, pruned_loss=0.01489, audio_tagging_loss=0.008195, over 14759.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.0886, pruned_loss=0.01179, audio_tagging_loss=0.00841, over 3040659.78 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:34:36,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3946873.3333333335, ans=0.0 2023-11-29 11:34:37,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3946873.3333333335, ans=0.125 2023-11-29 11:34:39,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3946873.3333333335, ans=0.0 2023-11-29 11:34:47,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-29 11:34:52,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3946940.0, ans=0.2 2023-11-29 11:34:55,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3946940.0, ans=0.05 2023-11-29 11:34:56,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592050 2023-11-29 11:34:59,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-29 11:35:06,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3947006.6666666665, ans=0.0 2023-11-29 11:35:24,260 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2900, loss[loss=0.0654, simple_loss=0.0869, pruned_loss=0.01011, audio_tagging_loss=0.01184, over 14974.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08916, pruned_loss=0.01188, audio_tagging_loss=0.008331, over 3040595.59 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:35:52,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3947273.3333333335, ans=0.05 2023-11-29 11:35:55,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=22.5 2023-11-29 11:35:58,241 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592100 2023-11-29 11:36:02,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.122e+01 9.763e+01 1.061e+02 1.440e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:36:04,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3947340.0, ans=0.0 2023-11-29 11:36:04,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3947340.0, ans=0.125 2023-11-29 11:36:23,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-29 11:36:26,614 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 2950, loss[loss=0.05756, simple_loss=0.0825, pruned_loss=0.009201, audio_tagging_loss=0.00711, over 15398.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08983, pruned_loss=0.01199, audio_tagging_loss=0.008345, over 3042158.51 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:36:31,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3947473.3333333335, ans=0.0 2023-11-29 11:36:50,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3947606.6666666665, ans=0.125 2023-11-29 11:36:59,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592150 2023-11-29 11:37:27,896 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3000, loss[loss=0.06393, simple_loss=0.09125, pruned_loss=0.01046, audio_tagging_loss=0.007847, over 16594.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08996, pruned_loss=0.01196, audio_tagging_loss=0.008478, over 3051401.88 frames. ], batch size: 63, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:37:27,897 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 11:37:55,613 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9480, 1.8350, 3.4731, 2.8508, 2.9264, 2.9247, 3.0069, 3.1483], device='cuda:3') 2023-11-29 11:38:07,425 INFO [train_asr.py:1267] (3/4) Epoch 50, validation: loss=0.05782, simple_loss=0.05046, pruned_loss=0.005473, audio_tagging_loss=0.02712, over 4681554.00 frames. 2023-11-29 11:38:07,425 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 11:38:20,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3947873.3333333335, ans=0.05 2023-11-29 11:38:30,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2023-11-29 11:38:40,730 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592200 2023-11-29 11:38:43,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3948006.6666666665, ans=0.125 2023-11-29 11:38:45,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.881e+01 9.188e+01 9.766e+01 1.056e+02 1.297e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:38:47,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3948006.6666666665, ans=0.0 2023-11-29 11:38:57,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-11-29 11:39:02,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-29 11:39:09,583 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3050, loss[loss=0.06388, simple_loss=0.0831, pruned_loss=0.01458, audio_tagging_loss=0.007752, over 16167.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08914, pruned_loss=0.01171, audio_tagging_loss=0.008544, over 3059702.23 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:39:14,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3948140.0, ans=0.125 2023-11-29 11:39:32,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3948273.3333333335, ans=0.125 2023-11-29 11:39:42,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592250 2023-11-29 11:39:45,497 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:39:59,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3948406.6666666665, ans=0.125 2023-11-29 11:40:00,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3948406.6666666665, ans=0.125 2023-11-29 11:40:11,010 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3100, loss[loss=0.0465, simple_loss=0.06282, pruned_loss=0.006572, audio_tagging_loss=0.008521, over 14699.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.0893, pruned_loss=0.01182, audio_tagging_loss=0.008654, over 3051837.17 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:40:14,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3948473.3333333335, ans=0.07 2023-11-29 11:40:23,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2023-11-29 11:40:31,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3948540.0, ans=0.125 2023-11-29 11:40:41,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3948606.6666666665, ans=0.125 2023-11-29 11:40:41,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3948606.6666666665, ans=0.125 2023-11-29 11:40:43,920 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592300 2023-11-29 11:40:48,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 9.181e+01 9.927e+01 1.074e+02 1.864e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 11:40:52,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3948673.3333333335, ans=0.125 2023-11-29 11:40:54,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3948673.3333333335, ans=0.125 2023-11-29 11:41:03,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3948740.0, ans=0.125 2023-11-29 11:41:04,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3948740.0, ans=0.125 2023-11-29 11:41:04,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3948740.0, ans=0.0 2023-11-29 11:41:12,054 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3150, loss[loss=0.06831, simple_loss=0.1055, pruned_loss=0.008607, audio_tagging_loss=0.006969, over 15566.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08933, pruned_loss=0.01188, audio_tagging_loss=0.008698, over 3045539.18 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:41:12,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-29 11:41:25,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3948873.3333333335, ans=0.2 2023-11-29 11:41:32,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3948873.3333333335, ans=0.0 2023-11-29 11:41:36,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3948940.0, ans=0.125 2023-11-29 11:41:45,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592350 2023-11-29 11:41:50,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3949006.6666666665, ans=0.125 2023-11-29 11:41:59,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3949073.3333333335, ans=0.125 2023-11-29 11:42:12,834 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3200, loss[loss=0.05325, simple_loss=0.06786, pruned_loss=0.005744, audio_tagging_loss=0.01357, over 14565.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09025, pruned_loss=0.01201, audio_tagging_loss=0.008752, over 3056707.24 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:42:17,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3949140.0, ans=0.2 2023-11-29 11:42:42,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3949273.3333333335, ans=0.125 2023-11-29 11:42:45,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3949273.3333333335, ans=0.0 2023-11-29 11:42:46,811 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592400 2023-11-29 11:42:51,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3949340.0, ans=0.05 2023-11-29 11:42:51,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-29 11:42:52,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.966e+01 9.651e+01 1.039e+02 1.549e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:43:06,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3949406.6666666665, ans=0.2 2023-11-29 11:43:15,844 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3250, loss[loss=0.05531, simple_loss=0.06634, pruned_loss=0.0105, audio_tagging_loss=0.01163, over 15916.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0891, pruned_loss=0.01185, audio_tagging_loss=0.008828, over 3055997.01 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:43:18,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3949473.3333333335, ans=0.0 2023-11-29 11:43:26,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3949473.3333333335, ans=0.125 2023-11-29 11:43:26,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2023-11-29 11:43:27,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3949540.0, ans=0.125 2023-11-29 11:43:36,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3949540.0, ans=0.2 2023-11-29 11:43:49,404 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592450 2023-11-29 11:43:55,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3949673.3333333335, ans=0.05 2023-11-29 11:44:03,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3949740.0, ans=0.125 2023-11-29 11:44:17,902 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3300, loss[loss=0.0469, simple_loss=0.05792, pruned_loss=0.005666, audio_tagging_loss=0.01228, over 15620.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08793, pruned_loss=0.01158, audio_tagging_loss=0.008895, over 3063612.32 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:44:22,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2023-11-29 11:44:25,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3949806.6666666665, ans=0.125 2023-11-29 11:44:32,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3949873.3333333335, ans=0.2 2023-11-29 11:44:38,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3949873.3333333335, ans=0.2 2023-11-29 11:44:41,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-29 11:44:49,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3949940.0, ans=0.125 2023-11-29 11:44:51,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592500 2023-11-29 11:44:56,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 9.045e+01 9.733e+01 1.044e+02 1.292e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 11:45:10,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3950073.3333333335, ans=0.125 2023-11-29 11:45:19,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3950140.0, ans=0.2 2023-11-29 11:45:20,749 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3350, loss[loss=0.08199, simple_loss=0.1164, pruned_loss=0.01676, audio_tagging_loss=0.007013, over 14941.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0885, pruned_loss=0.01171, audio_tagging_loss=0.008813, over 3062063.40 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:45:24,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-29 11:45:39,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2023-11-29 11:45:53,624 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592550 2023-11-29 11:46:14,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3950406.6666666665, ans=0.2 2023-11-29 11:46:22,688 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3400, loss[loss=0.07714, simple_loss=0.1109, pruned_loss=0.01287, audio_tagging_loss=0.008829, over 16300.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08856, pruned_loss=0.01169, audio_tagging_loss=0.008566, over 3055497.75 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:46:27,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3950473.3333333335, ans=0.125 2023-11-29 11:46:38,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.27 vs. limit=10.0 2023-11-29 11:46:44,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-29 11:46:49,990 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:46:56,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592600 2023-11-29 11:46:58,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-29 11:47:01,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.007e+01 9.772e+01 1.033e+02 1.333e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 11:47:17,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3950740.0, ans=0.0 2023-11-29 11:47:24,651 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3450, loss[loss=0.09186, simple_loss=0.1232, pruned_loss=0.02118, audio_tagging_loss=0.009081, over 14448.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08822, pruned_loss=0.01167, audio_tagging_loss=0.008509, over 3046930.92 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:47:32,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3950806.6666666665, ans=0.0 2023-11-29 11:47:32,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3950806.6666666665, ans=0.09899494936611666 2023-11-29 11:47:48,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3950873.3333333335, ans=0.035 2023-11-29 11:47:54,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2023-11-29 11:47:55,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3950940.0, ans=0.125 2023-11-29 11:47:58,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592650 2023-11-29 11:48:01,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3951006.6666666665, ans=0.125 2023-11-29 11:48:04,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3951006.6666666665, ans=0.1 2023-11-29 11:48:05,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3951006.6666666665, ans=0.125 2023-11-29 11:48:15,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3951073.3333333335, ans=0.0 2023-11-29 11:48:21,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3951073.3333333335, ans=0.125 2023-11-29 11:48:27,044 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3500, loss[loss=0.07436, simple_loss=0.1077, pruned_loss=0.01301, audio_tagging_loss=0.007508, over 15581.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0888, pruned_loss=0.01191, audio_tagging_loss=0.008421, over 3043773.36 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:48:58,860 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:49:00,103 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592700 2023-11-29 11:49:05,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.064e+01 9.893e+01 1.052e+02 1.385e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 11:49:10,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3951340.0, ans=0.2 2023-11-29 11:49:11,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3951340.0, ans=0.2 2023-11-29 11:49:29,221 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3550, loss[loss=0.06336, simple_loss=0.09367, pruned_loss=0.007493, audio_tagging_loss=0.009028, over 15647.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.0885, pruned_loss=0.01193, audio_tagging_loss=0.008346, over 3048714.22 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:49:42,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-29 11:50:01,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3951606.6666666665, ans=0.1 2023-11-29 11:50:02,782 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592750 2023-11-29 11:50:25,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3951740.0, ans=0.1 2023-11-29 11:50:30,379 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3600, loss[loss=0.06323, simple_loss=0.08637, pruned_loss=0.0116, audio_tagging_loss=0.008442, over 14222.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08878, pruned_loss=0.01196, audio_tagging_loss=0.008338, over 3042374.15 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:50:54,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-29 11:51:01,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3951940.0, ans=0.125 2023-11-29 11:51:04,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592800 2023-11-29 11:51:04,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3951940.0, ans=0.125 2023-11-29 11:51:09,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.101e+01 9.681e+01 1.023e+02 1.277e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 11:51:33,128 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3650, loss[loss=0.05233, simple_loss=0.07322, pruned_loss=0.007079, audio_tagging_loss=0.008639, over 14796.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08941, pruned_loss=0.01198, audio_tagging_loss=0.008291, over 3046854.12 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:51:33,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3952140.0, ans=0.125 2023-11-29 11:51:54,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3952206.6666666665, ans=0.5 2023-11-29 11:52:06,063 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592850 2023-11-29 11:52:07,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3952273.3333333335, ans=0.125 2023-11-29 11:52:08,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3952340.0, ans=0.125 2023-11-29 11:52:10,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3952340.0, ans=0.0 2023-11-29 11:52:20,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-29 11:52:30,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3952406.6666666665, ans=0.0 2023-11-29 11:52:34,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-29 11:52:35,269 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3700, loss[loss=0.06544, simple_loss=0.09483, pruned_loss=0.008653, audio_tagging_loss=0.009371, over 16003.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08978, pruned_loss=0.01195, audio_tagging_loss=0.008286, over 3044075.25 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:53:08,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592900 2023-11-29 11:53:14,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 9.192e+01 9.964e+01 1.058e+02 1.278e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-29 11:53:21,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2023-11-29 11:53:26,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3952740.0, ans=0.1 2023-11-29 11:53:36,602 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3750, loss[loss=0.07365, simple_loss=0.1091, pruned_loss=0.01489, audio_tagging_loss=0.004226, over 16801.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08891, pruned_loss=0.01179, audio_tagging_loss=0.00839, over 3049276.47 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:53:36,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3952806.6666666665, ans=0.035 2023-11-29 11:53:46,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3952806.6666666665, ans=0.07 2023-11-29 11:54:03,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3952940.0, ans=0.125 2023-11-29 11:54:09,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3952940.0, ans=0.125 2023-11-29 11:54:10,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 592950 2023-11-29 11:54:13,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3953006.6666666665, ans=0.2 2023-11-29 11:54:16,940 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:54:20,513 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:54:37,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3953140.0, ans=0.0 2023-11-29 11:54:38,442 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3800, loss[loss=0.07085, simple_loss=0.1003, pruned_loss=0.01378, audio_tagging_loss=0.006905, over 15272.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08854, pruned_loss=0.01185, audio_tagging_loss=0.008478, over 3044226.28 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:55:06,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3953273.3333333335, ans=0.125 2023-11-29 11:55:07,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3953273.3333333335, ans=0.125 2023-11-29 11:55:12,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593000 2023-11-29 11:55:18,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 9.089e+01 9.885e+01 1.067e+02 1.488e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 11:55:31,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3953406.6666666665, ans=0.125 2023-11-29 11:55:37,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3953406.6666666665, ans=0.0 2023-11-29 11:55:41,785 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3850, loss[loss=0.0661, simple_loss=0.09223, pruned_loss=0.01298, audio_tagging_loss=0.007006, over 15955.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08872, pruned_loss=0.01183, audio_tagging_loss=0.008501, over 3041785.93 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:55:45,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.33 vs. limit=5.0 2023-11-29 11:56:06,854 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:56:14,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593050 2023-11-29 11:56:17,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-29 11:56:31,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3953740.0, ans=0.125 2023-11-29 11:56:43,354 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3900, loss[loss=0.05841, simple_loss=0.0816, pruned_loss=0.009975, audio_tagging_loss=0.007639, over 15735.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08891, pruned_loss=0.01183, audio_tagging_loss=0.008579, over 3045090.82 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:56:44,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2023-11-29 11:57:00,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3953873.3333333335, ans=0.125 2023-11-29 11:57:15,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3953940.0, ans=0.0 2023-11-29 11:57:17,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593100 2023-11-29 11:57:23,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.927e+01 9.561e+01 1.012e+02 1.625e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 11:57:33,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3954073.3333333335, ans=0.0 2023-11-29 11:57:44,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3954140.0, ans=0.125 2023-11-29 11:57:45,087 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 3950, loss[loss=0.0737, simple_loss=0.1048, pruned_loss=0.01375, audio_tagging_loss=0.007526, over 14846.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08941, pruned_loss=0.01196, audio_tagging_loss=0.008493, over 3040866.12 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:57:49,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-29 11:57:53,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2023-11-29 11:57:57,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-29 11:58:12,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3954273.3333333335, ans=0.1 2023-11-29 11:58:18,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593150 2023-11-29 11:58:22,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3954340.0, ans=0.0 2023-11-29 11:58:47,950 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4000, loss[loss=0.06323, simple_loss=0.08242, pruned_loss=0.01351, audio_tagging_loss=0.008508, over 14899.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08922, pruned_loss=0.01181, audio_tagging_loss=0.0086, over 3039128.29 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:58:48,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3954473.3333333335, ans=0.125 2023-11-29 11:58:54,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3954473.3333333335, ans=0.125 2023-11-29 11:58:55,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3954473.3333333335, ans=15.0 2023-11-29 11:59:02,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3954540.0, ans=0.025 2023-11-29 11:59:15,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3954606.6666666665, ans=0.0 2023-11-29 11:59:20,393 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593200 2023-11-29 11:59:25,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3954673.3333333335, ans=0.0 2023-11-29 11:59:26,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.870e+01 8.877e+01 9.527e+01 1.031e+02 1.352e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 11:59:41,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3954740.0, ans=0.1 2023-11-29 11:59:49,339 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4050, loss[loss=0.06091, simple_loss=0.08726, pruned_loss=0.008317, audio_tagging_loss=0.008967, over 15133.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08885, pruned_loss=0.01168, audio_tagging_loss=0.008656, over 3034837.43 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:59:54,004 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:59:54,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3954806.6666666665, ans=0.0 2023-11-29 11:59:54,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3954806.6666666665, ans=0.125 2023-11-29 12:00:22,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593250 2023-11-29 12:00:23,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3954940.0, ans=0.125 2023-11-29 12:00:24,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-29 12:00:51,345 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4100, loss[loss=0.06029, simple_loss=0.07594, pruned_loss=0.01045, audio_tagging_loss=0.01187, over 13614.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08944, pruned_loss=0.01167, audio_tagging_loss=0.008626, over 3042815.05 frames. ], batch size: 50, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:00:51,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:00:51,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:00:52,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3955140.0, ans=0.125 2023-11-29 12:01:19,217 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:01:24,883 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593300 2023-11-29 12:01:28,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-29 12:01:31,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.221e+01 9.823e+01 1.065e+02 1.481e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 12:01:41,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3955406.6666666665, ans=0.04949747468305833 2023-11-29 12:01:42,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3955406.6666666665, ans=0.07 2023-11-29 12:01:46,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3955406.6666666665, ans=0.125 2023-11-29 12:01:52,925 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4150, loss[loss=0.07433, simple_loss=0.1027, pruned_loss=0.01397, audio_tagging_loss=0.009005, over 15419.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08987, pruned_loss=0.01174, audio_tagging_loss=0.008494, over 3038010.85 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:01:55,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3955473.3333333335, ans=0.125 2023-11-29 12:01:58,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3955473.3333333335, ans=0.125 2023-11-29 12:02:07,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3955540.0, ans=0.0 2023-11-29 12:02:10,762 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:02:19,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.06 vs. limit=22.5 2023-11-29 12:02:23,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3955606.6666666665, ans=0.125 2023-11-29 12:02:26,200 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593350 2023-11-29 12:02:26,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3955606.6666666665, ans=0.2 2023-11-29 12:02:33,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3955673.3333333335, ans=0.125 2023-11-29 12:02:34,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3955673.3333333335, ans=0.1 2023-11-29 12:02:37,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3955673.3333333335, ans=0.1 2023-11-29 12:02:38,414 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:02:43,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3955740.0, ans=0.1 2023-11-29 12:02:46,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-29 12:02:48,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3955740.0, ans=0.125 2023-11-29 12:02:54,778 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4200, loss[loss=0.06765, simple_loss=0.09541, pruned_loss=0.01269, audio_tagging_loss=0.007261, over 14867.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.09, pruned_loss=0.01171, audio_tagging_loss=0.008453, over 3036912.30 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:02:55,110 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:03:28,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593400 2023-11-29 12:03:29,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3955940.0, ans=0.125 2023-11-29 12:03:35,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.099e+01 9.882e+01 1.051e+02 1.333e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 12:03:39,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3956006.6666666665, ans=0.1 2023-11-29 12:03:44,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3956073.3333333335, ans=0.025 2023-11-29 12:03:56,518 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4250, loss[loss=0.06283, simple_loss=0.08814, pruned_loss=0.0103, audio_tagging_loss=0.008466, over 15104.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08991, pruned_loss=0.01164, audio_tagging_loss=0.008377, over 3041188.45 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:04:04,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-29 12:04:30,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593450 2023-11-29 12:04:41,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3956340.0, ans=0.07 2023-11-29 12:04:48,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2023-11-29 12:04:52,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3956406.6666666665, ans=0.125 2023-11-29 12:04:58,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-29 12:04:58,929 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4300, loss[loss=0.07305, simple_loss=0.09815, pruned_loss=0.01571, audio_tagging_loss=0.008261, over 15840.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.09022, pruned_loss=0.01167, audio_tagging_loss=0.008307, over 3044602.67 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:04:59,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-11-29 12:05:05,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3956473.3333333335, ans=0.125 2023-11-29 12:05:19,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3956540.0, ans=0.0 2023-11-29 12:05:22,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3956606.6666666665, ans=0.0 2023-11-29 12:05:31,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593500 2023-11-29 12:05:36,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3956673.3333333335, ans=0.125 2023-11-29 12:05:40,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 9.077e+01 9.622e+01 1.047e+02 1.414e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 12:05:52,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-29 12:05:53,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-11-29 12:05:59,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3956806.6666666665, ans=0.125 2023-11-29 12:06:00,181 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4350, loss[loss=0.07046, simple_loss=0.09423, pruned_loss=0.01445, audio_tagging_loss=0.008894, over 14607.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09107, pruned_loss=0.01194, audio_tagging_loss=0.008244, over 3046712.09 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:06:08,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3956806.6666666665, ans=0.0 2023-11-29 12:06:17,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3956873.3333333335, ans=0.125 2023-11-29 12:06:20,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3956873.3333333335, ans=0.1 2023-11-29 12:06:33,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593550 2023-11-29 12:06:55,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3957073.3333333335, ans=0.125 2023-11-29 12:06:59,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=12.0 2023-11-29 12:07:01,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3957140.0, ans=0.125 2023-11-29 12:07:02,028 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4400, loss[loss=0.04817, simple_loss=0.06113, pruned_loss=0.007255, audio_tagging_loss=0.01035, over 14027.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08959, pruned_loss=0.01175, audio_tagging_loss=0.008348, over 3042538.69 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:07:16,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3957206.6666666665, ans=0.125 2023-11-29 12:07:24,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3957206.6666666665, ans=0.125 2023-11-29 12:07:36,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-29 12:07:36,507 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593600 2023-11-29 12:07:45,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.188e+01 9.758e+01 1.053e+02 1.476e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 12:07:48,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3957340.0, ans=0.1 2023-11-29 12:08:03,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3957406.6666666665, ans=0.1 2023-11-29 12:08:05,328 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4450, loss[loss=0.07093, simple_loss=0.1005, pruned_loss=0.01499, audio_tagging_loss=0.005675, over 15022.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08938, pruned_loss=0.01163, audio_tagging_loss=0.008344, over 3049756.72 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:08:20,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3957540.0, ans=10.0 2023-11-29 12:08:25,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3957540.0, ans=0.0 2023-11-29 12:08:38,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593650 2023-11-29 12:08:39,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3957606.6666666665, ans=0.0 2023-11-29 12:08:43,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-11-29 12:08:56,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3957740.0, ans=0.1 2023-11-29 12:09:07,775 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4500, loss[loss=0.06325, simple_loss=0.08276, pruned_loss=0.01346, audio_tagging_loss=0.008416, over 14191.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.09021, pruned_loss=0.01191, audio_tagging_loss=0.008373, over 3049438.75 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:09:15,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3957806.6666666665, ans=0.2 2023-11-29 12:09:33,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3957940.0, ans=0.09899494936611666 2023-11-29 12:09:41,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593700 2023-11-29 12:09:47,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-29 12:09:50,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.149e+01 9.833e+01 1.069e+02 1.731e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 12:09:53,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3958006.6666666665, ans=0.2 2023-11-29 12:09:55,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3958006.6666666665, ans=0.125 2023-11-29 12:10:04,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3958073.3333333335, ans=0.125 2023-11-29 12:10:04,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2023-11-29 12:10:08,733 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4550, loss[loss=0.06151, simple_loss=0.08663, pruned_loss=0.0105, audio_tagging_loss=0.007697, over 14725.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08965, pruned_loss=0.01176, audio_tagging_loss=0.008409, over 3044639.23 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:10:10,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3958140.0, ans=0.125 2023-11-29 12:10:34,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3958273.3333333335, ans=0.125 2023-11-29 12:10:43,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593750 2023-11-29 12:10:47,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-29 12:10:52,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3958340.0, ans=0.125 2023-11-29 12:10:57,174 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:11:05,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3958406.6666666665, ans=0.125 2023-11-29 12:11:05,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3958406.6666666665, ans=0.125 2023-11-29 12:11:11,236 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4600, loss[loss=0.04914, simple_loss=0.05265, pruned_loss=0.01003, audio_tagging_loss=0.01279, over 14521.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.0886, pruned_loss=0.01169, audio_tagging_loss=0.008474, over 3041885.06 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:11:12,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=15.0 2023-11-29 12:11:25,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3958540.0, ans=0.125 2023-11-29 12:11:36,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2023-11-29 12:11:37,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3958606.6666666665, ans=0.125 2023-11-29 12:11:44,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593800 2023-11-29 12:11:53,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.081e+01 9.672e+01 1.036e+02 1.224e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 12:12:00,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3958740.0, ans=0.125 2023-11-29 12:12:04,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-29 12:12:11,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3958740.0, ans=0.0 2023-11-29 12:12:11,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3958740.0, ans=0.0 2023-11-29 12:12:13,806 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4650, loss[loss=0.05995, simple_loss=0.08454, pruned_loss=0.007831, audio_tagging_loss=0.009847, over 16290.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08861, pruned_loss=0.01176, audio_tagging_loss=0.008574, over 3041951.86 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:12:16,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3958806.6666666665, ans=0.125 2023-11-29 12:12:31,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2023-11-29 12:12:46,249 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593850 2023-11-29 12:12:48,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3958940.0, ans=0.05 2023-11-29 12:12:50,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3959006.6666666665, ans=0.0 2023-11-29 12:13:14,350 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4700, loss[loss=0.05898, simple_loss=0.08254, pruned_loss=0.01077, audio_tagging_loss=0.00694, over 14861.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08881, pruned_loss=0.01168, audio_tagging_loss=0.008696, over 3050088.17 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:13:28,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3959206.6666666665, ans=0.1 2023-11-29 12:13:44,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2023-11-29 12:13:48,588 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593900 2023-11-29 12:13:53,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3959340.0, ans=0.125 2023-11-29 12:13:56,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.196e+01 9.820e+01 1.091e+02 1.389e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 12:14:16,791 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4750, loss[loss=0.04455, simple_loss=0.05649, pruned_loss=0.005933, audio_tagging_loss=0.01037, over 14564.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08835, pruned_loss=0.01174, audio_tagging_loss=0.008739, over 3039681.92 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:14:19,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3959473.3333333335, ans=0.0 2023-11-29 12:14:19,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3959473.3333333335, ans=0.0 2023-11-29 12:14:39,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3959540.0, ans=0.125 2023-11-29 12:14:49,587 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 593950 2023-11-29 12:14:55,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3959673.3333333335, ans=0.125 2023-11-29 12:15:19,315 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4800, loss[loss=0.04421, simple_loss=0.06342, pruned_loss=0.005588, audio_tagging_loss=0.006914, over 14735.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08818, pruned_loss=0.01163, audio_tagging_loss=0.008766, over 3037725.52 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:15:30,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3959873.3333333335, ans=0.0 2023-11-29 12:15:52,339 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594000 2023-11-29 12:15:52,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-29 12:16:01,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 9.011e+01 9.691e+01 1.047e+02 1.422e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 12:16:03,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3960006.6666666665, ans=0.0 2023-11-29 12:16:18,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-11-29 12:16:20,288 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4850, loss[loss=0.05523, simple_loss=0.0688, pruned_loss=0.01052, audio_tagging_loss=0.01031, over 13728.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08808, pruned_loss=0.01175, audio_tagging_loss=0.008929, over 3036161.88 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:16:21,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3960140.0, ans=0.125 2023-11-29 12:16:25,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3960140.0, ans=0.09899494936611666 2023-11-29 12:16:37,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3960206.6666666665, ans=0.125 2023-11-29 12:16:54,294 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594050 2023-11-29 12:17:01,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3960340.0, ans=0.1 2023-11-29 12:17:21,447 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4900, loss[loss=0.06887, simple_loss=0.1012, pruned_loss=0.01061, audio_tagging_loss=0.007662, over 14286.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0892, pruned_loss=0.01204, audio_tagging_loss=0.008837, over 3034161.78 frames. ], batch size: 52, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:17:26,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3960473.3333333335, ans=0.0 2023-11-29 12:17:39,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2023-11-29 12:17:43,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3960540.0, ans=0.2 2023-11-29 12:17:53,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3960606.6666666665, ans=0.125 2023-11-29 12:17:55,262 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594100 2023-11-29 12:18:04,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.116e+01 9.769e+01 1.041e+02 2.380e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 12:18:06,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3960673.3333333335, ans=0.125 2023-11-29 12:18:24,988 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 4950, loss[loss=0.06652, simple_loss=0.09365, pruned_loss=0.01279, audio_tagging_loss=0.006901, over 14912.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08918, pruned_loss=0.01203, audio_tagging_loss=0.008651, over 3035333.63 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:18:26,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-29 12:18:40,585 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:18:57,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594150 2023-11-29 12:19:26,299 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5000, loss[loss=0.0835, simple_loss=0.1258, pruned_loss=0.01459, audio_tagging_loss=0.006012, over 16351.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08934, pruned_loss=0.01187, audio_tagging_loss=0.008525, over 3031510.51 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:19:28,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3961140.0, ans=0.0 2023-11-29 12:19:32,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3961140.0, ans=0.0 2023-11-29 12:19:58,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3961273.3333333335, ans=0.0 2023-11-29 12:19:59,597 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594200 2023-11-29 12:20:09,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.950e+01 9.411e+01 1.015e+02 1.285e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 12:20:15,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-11-29 12:20:27,695 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5050, loss[loss=0.05902, simple_loss=0.07696, pruned_loss=0.009814, audio_tagging_loss=0.01073, over 14614.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08855, pruned_loss=0.01179, audio_tagging_loss=0.008532, over 3034192.23 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:20:28,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3961473.3333333335, ans=0.125 2023-11-29 12:20:32,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3961473.3333333335, ans=0.2 2023-11-29 12:20:33,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3961473.3333333335, ans=0.125 2023-11-29 12:20:51,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3961540.0, ans=0.125 2023-11-29 12:20:54,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-29 12:21:01,466 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594250 2023-11-29 12:21:30,065 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5100, loss[loss=0.06935, simple_loss=0.09407, pruned_loss=0.01504, audio_tagging_loss=0.007274, over 15460.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08836, pruned_loss=0.0118, audio_tagging_loss=0.008461, over 3034148.39 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:21:45,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-29 12:21:47,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3961873.3333333335, ans=0.125 2023-11-29 12:21:59,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3961940.0, ans=0.125 2023-11-29 12:22:02,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3961940.0, ans=0.125 2023-11-29 12:22:03,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594300 2023-11-29 12:22:05,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3961940.0, ans=0.125 2023-11-29 12:22:13,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.985e+01 9.588e+01 1.015e+02 1.337e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:22:22,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-29 12:22:32,653 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5150, loss[loss=0.06355, simple_loss=0.08895, pruned_loss=0.01234, audio_tagging_loss=0.006735, over 15667.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08856, pruned_loss=0.0118, audio_tagging_loss=0.008407, over 3038549.72 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:22:35,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3962140.0, ans=0.1 2023-11-29 12:23:06,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594350 2023-11-29 12:23:11,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3962340.0, ans=0.125 2023-11-29 12:23:14,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3962340.0, ans=0.1 2023-11-29 12:23:34,647 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5200, loss[loss=0.06783, simple_loss=0.09426, pruned_loss=0.01081, audio_tagging_loss=0.009893, over 14445.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08837, pruned_loss=0.01179, audio_tagging_loss=0.008333, over 3041002.36 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:23:43,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3962473.3333333335, ans=0.07 2023-11-29 12:23:48,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=22.5 2023-11-29 12:23:49,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3962540.0, ans=0.0 2023-11-29 12:23:55,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3962540.0, ans=0.0 2023-11-29 12:24:08,847 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594400 2023-11-29 12:24:18,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.283e+01 9.729e+01 1.049e+02 1.320e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 12:24:22,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3962673.3333333335, ans=0.2 2023-11-29 12:24:24,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3962740.0, ans=0.125 2023-11-29 12:24:33,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3962740.0, ans=0.2 2023-11-29 12:24:33,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3962740.0, ans=0.0 2023-11-29 12:24:35,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-11-29 12:24:36,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3962806.6666666665, ans=0.1 2023-11-29 12:24:37,191 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5250, loss[loss=0.04331, simple_loss=0.0537, pruned_loss=0.006871, audio_tagging_loss=0.00959, over 17294.00 frames. ], tot_loss[loss=0.06393, simple_loss=0.08773, pruned_loss=0.01171, audio_tagging_loss=0.008353, over 3048216.75 frames. ], batch size: 68, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:24:47,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3962806.6666666665, ans=0.2 2023-11-29 12:24:48,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3962806.6666666665, ans=0.0 2023-11-29 12:24:50,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3962873.3333333335, ans=0.125 2023-11-29 12:25:10,169 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594450 2023-11-29 12:25:17,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2023-11-29 12:25:22,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3963006.6666666665, ans=0.2 2023-11-29 12:25:39,445 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5300, loss[loss=0.07152, simple_loss=0.0971, pruned_loss=0.0117, audio_tagging_loss=0.01127, over 14646.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08873, pruned_loss=0.0119, audio_tagging_loss=0.008337, over 3048086.40 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:25:42,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=22.5 2023-11-29 12:25:45,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3963140.0, ans=0.125 2023-11-29 12:26:09,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3963273.3333333335, ans=0.2 2023-11-29 12:26:13,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594500 2023-11-29 12:26:19,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3963340.0, ans=0.0 2023-11-29 12:26:22,670 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.148e+01 9.632e+01 1.017e+02 1.264e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 12:26:39,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.86 vs. limit=22.5 2023-11-29 12:26:41,261 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5350, loss[loss=0.07379, simple_loss=0.1117, pruned_loss=0.01219, audio_tagging_loss=0.005733, over 15505.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08948, pruned_loss=0.0119, audio_tagging_loss=0.008299, over 3047201.92 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:27:07,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3963606.6666666665, ans=0.0 2023-11-29 12:27:15,402 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594550 2023-11-29 12:27:32,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3963740.0, ans=0.1 2023-11-29 12:27:43,660 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5400, loss[loss=0.05235, simple_loss=0.06848, pruned_loss=0.01004, audio_tagging_loss=0.008064, over 15435.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08972, pruned_loss=0.01191, audio_tagging_loss=0.008365, over 3043648.10 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:28:03,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3963873.3333333335, ans=0.1 2023-11-29 12:28:04,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-29 12:28:16,327 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594600 2023-11-29 12:28:20,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3964006.6666666665, ans=0.2 2023-11-29 12:28:22,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-11-29 12:28:23,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3964006.6666666665, ans=0.0 2023-11-29 12:28:23,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3964006.6666666665, ans=0.125 2023-11-29 12:28:26,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 9.096e+01 9.650e+01 1.029e+02 1.446e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 12:28:45,292 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5450, loss[loss=0.05285, simple_loss=0.07662, pruned_loss=0.008779, audio_tagging_loss=0.005758, over 15342.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08936, pruned_loss=0.01173, audio_tagging_loss=0.008452, over 3043063.87 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:28:49,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3964140.0, ans=0.1 2023-11-29 12:28:54,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3964140.0, ans=0.1 2023-11-29 12:29:07,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3964206.6666666665, ans=0.125 2023-11-29 12:29:19,044 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594650 2023-11-29 12:29:47,556 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5500, loss[loss=0.06108, simple_loss=0.0892, pruned_loss=0.009237, audio_tagging_loss=0.007245, over 16463.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08947, pruned_loss=0.01183, audio_tagging_loss=0.008535, over 3044688.90 frames. ], batch size: 66, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:30:21,168 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594700 2023-11-29 12:30:21,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-29 12:30:22,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3964606.6666666665, ans=0.125 2023-11-29 12:30:32,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 9.260e+01 9.828e+01 1.052e+02 2.145e+02, threshold=1.966e+02, percent-clipped=1.0 2023-11-29 12:30:41,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3964740.0, ans=0.125 2023-11-29 12:30:49,513 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5550, loss[loss=0.06544, simple_loss=0.09298, pruned_loss=0.008964, audio_tagging_loss=0.009985, over 15609.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08987, pruned_loss=0.01174, audio_tagging_loss=0.008656, over 3050067.30 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:31:11,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3964873.3333333335, ans=0.2 2023-11-29 12:31:12,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3964940.0, ans=0.0 2023-11-29 12:31:22,529 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594750 2023-11-29 12:31:37,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3965073.3333333335, ans=0.125 2023-11-29 12:31:52,115 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5600, loss[loss=0.05019, simple_loss=0.07796, pruned_loss=0.003946, audio_tagging_loss=0.007263, over 15444.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08975, pruned_loss=0.01169, audio_tagging_loss=0.008701, over 3055184.21 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:31:55,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3965140.0, ans=0.125 2023-11-29 12:32:16,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3965273.3333333335, ans=0.2 2023-11-29 12:32:20,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3965273.3333333335, ans=0.0 2023-11-29 12:32:25,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594800 2023-11-29 12:32:37,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.303e+01 9.793e+01 1.041e+02 1.252e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 12:32:38,535 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:32:39,838 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:32:53,625 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5650, loss[loss=0.06526, simple_loss=0.0989, pruned_loss=0.008701, audio_tagging_loss=0.007104, over 15916.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08925, pruned_loss=0.01161, audio_tagging_loss=0.008763, over 3051056.99 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:33:00,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3965473.3333333335, ans=0.1 2023-11-29 12:33:28,091 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594850 2023-11-29 12:33:35,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3965673.3333333335, ans=0.1 2023-11-29 12:33:56,398 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5700, loss[loss=0.05866, simple_loss=0.08452, pruned_loss=0.009025, audio_tagging_loss=0.00737, over 15225.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08882, pruned_loss=0.01148, audio_tagging_loss=0.008806, over 3050452.59 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:34:20,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-29 12:34:22,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3965940.0, ans=0.0 2023-11-29 12:34:29,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594900 2023-11-29 12:34:30,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3965940.0, ans=0.0 2023-11-29 12:34:32,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3966006.6666666665, ans=0.125 2023-11-29 12:34:33,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-29 12:34:40,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.907e+01 9.442e+01 9.916e+01 1.221e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 12:34:52,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3966073.3333333335, ans=0.0 2023-11-29 12:34:58,491 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5750, loss[loss=0.06394, simple_loss=0.08901, pruned_loss=0.01236, audio_tagging_loss=0.007074, over 14216.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08889, pruned_loss=0.01168, audio_tagging_loss=0.008715, over 3044522.90 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:35:18,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3966206.6666666665, ans=0.0 2023-11-29 12:35:19,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2023-11-29 12:35:26,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3966273.3333333335, ans=0.0 2023-11-29 12:35:29,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3966273.3333333335, ans=0.0 2023-11-29 12:35:31,955 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 594950 2023-11-29 12:35:33,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.07 vs. limit=22.5 2023-11-29 12:35:42,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3966340.0, ans=0.125 2023-11-29 12:35:48,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3966406.6666666665, ans=0.1 2023-11-29 12:36:00,172 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5800, loss[loss=0.07429, simple_loss=0.1054, pruned_loss=0.01559, audio_tagging_loss=0.006013, over 15629.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08863, pruned_loss=0.01177, audio_tagging_loss=0.008636, over 3043526.98 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:36:09,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3966473.3333333335, ans=0.0 2023-11-29 12:36:14,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-29 12:36:28,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3966606.6666666665, ans=0.125 2023-11-29 12:36:34,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595000 2023-11-29 12:36:45,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.166e+01 9.851e+01 1.059e+02 1.504e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 12:37:01,716 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5850, loss[loss=0.05449, simple_loss=0.07966, pruned_loss=0.006391, audio_tagging_loss=0.008266, over 15551.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08828, pruned_loss=0.01185, audio_tagging_loss=0.008571, over 3042635.04 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:37:02,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3966806.6666666665, ans=0.0 2023-11-29 12:37:13,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3966873.3333333335, ans=0.025 2023-11-29 12:37:25,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3966940.0, ans=0.125 2023-11-29 12:37:34,413 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595050 2023-11-29 12:37:41,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3967006.6666666665, ans=0.125 2023-11-29 12:37:59,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2023-11-29 12:38:03,790 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5900, loss[loss=0.05568, simple_loss=0.07785, pruned_loss=0.009143, audio_tagging_loss=0.007614, over 15229.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08862, pruned_loss=0.01175, audio_tagging_loss=0.008544, over 3042321.13 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:38:26,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3967273.3333333335, ans=0.125 2023-11-29 12:38:36,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595100 2023-11-29 12:38:49,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.143e+01 9.896e+01 1.087e+02 1.374e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 12:39:04,743 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 5950, loss[loss=0.04704, simple_loss=0.05882, pruned_loss=0.00661, audio_tagging_loss=0.01102, over 15596.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08864, pruned_loss=0.01185, audio_tagging_loss=0.008475, over 3049983.23 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:39:21,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-11-29 12:39:23,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3967540.0, ans=0.2 2023-11-29 12:39:38,907 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595150 2023-11-29 12:39:42,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3967673.3333333335, ans=0.125 2023-11-29 12:39:52,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3967673.3333333335, ans=6.0 2023-11-29 12:40:06,615 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6000, loss[loss=0.0646, simple_loss=0.08573, pruned_loss=0.01335, audio_tagging_loss=0.008396, over 13531.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.0884, pruned_loss=0.01186, audio_tagging_loss=0.008436, over 3046198.35 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:40:06,616 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 12:40:46,465 INFO [train_asr.py:1267] (3/4) Epoch 50, validation: loss=0.05775, simple_loss=0.05043, pruned_loss=0.005339, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-29 12:40:46,466 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 12:40:46,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3967806.6666666665, ans=0.95 2023-11-29 12:40:49,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3967806.6666666665, ans=0.1 2023-11-29 12:40:50,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3967806.6666666665, ans=0.1 2023-11-29 12:41:18,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595200 2023-11-29 12:41:32,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.055e+01 9.788e+01 1.026e+02 1.358e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 12:41:32,916 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:41:46,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3968073.3333333335, ans=0.125 2023-11-29 12:41:48,265 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6050, loss[loss=0.05787, simple_loss=0.08167, pruned_loss=0.006469, audio_tagging_loss=0.01056, over 15007.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08846, pruned_loss=0.01197, audio_tagging_loss=0.008448, over 3054048.54 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:41:48,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3968140.0, ans=0.125 2023-11-29 12:41:48,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3968140.0, ans=0.0 2023-11-29 12:42:08,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3968206.6666666665, ans=0.0 2023-11-29 12:42:17,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3968273.3333333335, ans=0.2 2023-11-29 12:42:17,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3968273.3333333335, ans=0.0 2023-11-29 12:42:21,681 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595250 2023-11-29 12:42:21,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3968273.3333333335, ans=10.0 2023-11-29 12:42:30,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3968340.0, ans=0.125 2023-11-29 12:42:42,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3968406.6666666665, ans=0.0 2023-11-29 12:42:43,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3968406.6666666665, ans=0.0 2023-11-29 12:42:49,419 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6100, loss[loss=0.07066, simple_loss=0.09516, pruned_loss=0.01382, audio_tagging_loss=0.009256, over 14671.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08775, pruned_loss=0.01167, audio_tagging_loss=0.008455, over 3049479.68 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:42:57,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2023-11-29 12:43:11,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3968540.0, ans=0.125 2023-11-29 12:43:12,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3968540.0, ans=0.2 2023-11-29 12:43:22,741 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595300 2023-11-29 12:43:25,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3968673.3333333335, ans=0.0 2023-11-29 12:43:35,518 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.140e+01 9.748e+01 1.061e+02 1.283e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 12:43:51,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3968806.6666666665, ans=0.0 2023-11-29 12:43:52,128 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6150, loss[loss=0.07875, simple_loss=0.1118, pruned_loss=0.01639, audio_tagging_loss=0.006434, over 15616.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08832, pruned_loss=0.01183, audio_tagging_loss=0.00847, over 3051634.60 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:44:00,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-29 12:44:13,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3968873.3333333335, ans=0.125 2023-11-29 12:44:21,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-11-29 12:44:24,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595350 2023-11-29 12:44:52,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3969140.0, ans=0.2 2023-11-29 12:44:53,560 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6200, loss[loss=0.06393, simple_loss=0.08957, pruned_loss=0.01013, audio_tagging_loss=0.00902, over 14456.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08916, pruned_loss=0.01193, audio_tagging_loss=0.008506, over 3047342.52 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:45:03,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3969140.0, ans=0.125 2023-11-29 12:45:26,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3969273.3333333335, ans=0.125 2023-11-29 12:45:27,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595400 2023-11-29 12:45:39,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 9.032e+01 9.591e+01 1.015e+02 1.293e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:45:50,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3969406.6666666665, ans=0.0 2023-11-29 12:45:52,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3969406.6666666665, ans=0.125 2023-11-29 12:45:52,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3969406.6666666665, ans=0.0 2023-11-29 12:45:55,748 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6250, loss[loss=0.07114, simple_loss=0.09289, pruned_loss=0.01621, audio_tagging_loss=0.008488, over 14531.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08851, pruned_loss=0.01197, audio_tagging_loss=0.008682, over 3045199.87 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:46:29,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595450 2023-11-29 12:46:57,360 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6300, loss[loss=0.06542, simple_loss=0.09539, pruned_loss=0.009647, audio_tagging_loss=0.008078, over 14958.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08872, pruned_loss=0.01186, audio_tagging_loss=0.008719, over 3047878.69 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:47:01,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3969806.6666666665, ans=0.125 2023-11-29 12:47:24,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3969940.0, ans=0.125 2023-11-29 12:47:26,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3969940.0, ans=0.125 2023-11-29 12:47:31,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595500 2023-11-29 12:47:31,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3969940.0, ans=0.1 2023-11-29 12:47:34,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3970006.6666666665, ans=0.5 2023-11-29 12:47:44,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.915e+01 9.519e+01 1.033e+02 1.401e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 12:47:51,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-11-29 12:47:53,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3970073.3333333335, ans=0.0 2023-11-29 12:47:55,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-29 12:47:59,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3970140.0, ans=0.125 2023-11-29 12:47:59,903 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6350, loss[loss=0.08044, simple_loss=0.112, pruned_loss=0.01699, audio_tagging_loss=0.007461, over 15483.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08908, pruned_loss=0.01197, audio_tagging_loss=0.008704, over 3041100.58 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:48:09,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3970140.0, ans=0.1 2023-11-29 12:48:12,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3970206.6666666665, ans=0.025 2023-11-29 12:48:27,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3970273.3333333335, ans=0.2 2023-11-29 12:48:32,933 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595550 2023-11-29 12:48:41,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-11-29 12:48:44,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3970340.0, ans=0.125 2023-11-29 12:49:01,849 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6400, loss[loss=0.04964, simple_loss=0.06755, pruned_loss=0.005906, audio_tagging_loss=0.009954, over 14555.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08879, pruned_loss=0.01184, audio_tagging_loss=0.008773, over 3044804.79 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:49:11,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3970473.3333333335, ans=0.2 2023-11-29 12:49:35,422 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595600 2023-11-29 12:49:50,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 9.115e+01 9.887e+01 1.069e+02 1.285e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 12:49:57,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3970740.0, ans=15.0 2023-11-29 12:50:03,033 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6450, loss[loss=0.05457, simple_loss=0.07091, pruned_loss=0.00844, audio_tagging_loss=0.01068, over 16594.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08875, pruned_loss=0.01178, audio_tagging_loss=0.00883, over 3050287.30 frames. ], batch size: 67, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:50:09,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3970806.6666666665, ans=10.0 2023-11-29 12:50:11,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3970806.6666666665, ans=0.125 2023-11-29 12:50:15,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3970873.3333333335, ans=0.0 2023-11-29 12:50:36,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3970940.0, ans=0.125 2023-11-29 12:50:37,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595650 2023-11-29 12:50:37,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3970940.0, ans=0.0 2023-11-29 12:50:42,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3971006.6666666665, ans=0.1 2023-11-29 12:51:05,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3971140.0, ans=0.125 2023-11-29 12:51:05,776 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6500, loss[loss=0.05836, simple_loss=0.07759, pruned_loss=0.009625, audio_tagging_loss=0.009946, over 16300.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08817, pruned_loss=0.01156, audio_tagging_loss=0.008692, over 3050884.11 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:51:22,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3971206.6666666665, ans=0.125 2023-11-29 12:51:25,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3971206.6666666665, ans=0.125 2023-11-29 12:51:39,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595700 2023-11-29 12:51:44,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3971340.0, ans=0.0 2023-11-29 12:51:49,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3971340.0, ans=0.125 2023-11-29 12:51:54,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.181e+01 9.755e+01 1.057e+02 1.258e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 12:51:56,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2023-11-29 12:51:56,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3971406.6666666665, ans=0.0 2023-11-29 12:52:07,937 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6550, loss[loss=0.07101, simple_loss=0.1068, pruned_loss=0.01015, audio_tagging_loss=0.007456, over 16150.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08841, pruned_loss=0.01154, audio_tagging_loss=0.008602, over 3058811.88 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:52:08,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3971473.3333333335, ans=0.125 2023-11-29 12:52:09,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3971473.3333333335, ans=0.125 2023-11-29 12:52:28,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3971540.0, ans=0.04949747468305833 2023-11-29 12:52:29,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-29 12:52:41,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595750 2023-11-29 12:52:53,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-29 12:52:56,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3971740.0, ans=0.125 2023-11-29 12:53:07,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-29 12:53:09,514 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6600, loss[loss=0.0402, simple_loss=0.05527, pruned_loss=0.003847, audio_tagging_loss=0.008721, over 15444.00 frames. ], tot_loss[loss=0.06346, simple_loss=0.08718, pruned_loss=0.01141, audio_tagging_loss=0.008453, over 3050464.26 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:53:22,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2023-11-29 12:53:29,235 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:53:32,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3971940.0, ans=0.0 2023-11-29 12:53:42,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595800 2023-11-29 12:53:49,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3972006.6666666665, ans=0.5 2023-11-29 12:53:53,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3972006.6666666665, ans=0.2 2023-11-29 12:53:54,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3972006.6666666665, ans=0.125 2023-11-29 12:53:57,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.899e+01 9.360e+01 1.006e+02 1.174e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 12:54:11,710 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6650, loss[loss=0.07845, simple_loss=0.1126, pruned_loss=0.01431, audio_tagging_loss=0.007847, over 15508.00 frames. ], tot_loss[loss=0.06372, simple_loss=0.08761, pruned_loss=0.01145, audio_tagging_loss=0.008466, over 3043603.11 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:54:13,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3972140.0, ans=0.125 2023-11-29 12:54:23,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.40 vs. limit=5.0 2023-11-29 12:54:25,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-11-29 12:54:30,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3972206.6666666665, ans=0.0 2023-11-29 12:54:33,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3972206.6666666665, ans=0.125 2023-11-29 12:54:35,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3972273.3333333335, ans=0.125 2023-11-29 12:54:38,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3972273.3333333335, ans=0.125 2023-11-29 12:54:40,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3972273.3333333335, ans=0.07 2023-11-29 12:54:44,985 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595850 2023-11-29 12:54:49,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3972340.0, ans=0.125 2023-11-29 12:55:13,869 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6700, loss[loss=0.04716, simple_loss=0.07157, pruned_loss=0.003406, audio_tagging_loss=0.007969, over 15560.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08824, pruned_loss=0.01148, audio_tagging_loss=0.008355, over 3046789.23 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:55:24,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-29 12:55:27,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3972540.0, ans=0.1 2023-11-29 12:55:46,583 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595900 2023-11-29 12:55:48,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=12.0 2023-11-29 12:55:52,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3972673.3333333335, ans=0.1 2023-11-29 12:56:01,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.149e+01 9.695e+01 1.030e+02 1.289e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 12:56:08,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3972740.0, ans=0.125 2023-11-29 12:56:15,549 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6750, loss[loss=0.07344, simple_loss=0.1061, pruned_loss=0.0157, audio_tagging_loss=0.004707, over 14598.00 frames. ], tot_loss[loss=0.06368, simple_loss=0.08781, pruned_loss=0.01141, audio_tagging_loss=0.008364, over 3036762.43 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:56:45,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3972940.0, ans=0.0 2023-11-29 12:56:47,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-29 12:56:49,861 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 595950 2023-11-29 12:57:18,149 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6800, loss[loss=0.06024, simple_loss=0.08132, pruned_loss=0.0102, audio_tagging_loss=0.009387, over 15557.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08817, pruned_loss=0.01162, audio_tagging_loss=0.008351, over 3043848.45 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:57:21,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-29 12:57:48,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3973273.3333333335, ans=0.125 2023-11-29 12:57:51,583 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596000 2023-11-29 12:58:09,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.975e+01 9.560e+01 1.019e+02 1.968e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-29 12:58:10,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3973406.6666666665, ans=0.125 2023-11-29 12:58:15,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3973406.6666666665, ans=0.0 2023-11-29 12:58:22,172 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6850, loss[loss=0.05163, simple_loss=0.06352, pruned_loss=0.009625, audio_tagging_loss=0.01024, over 15220.00 frames. ], tot_loss[loss=0.06382, simple_loss=0.08784, pruned_loss=0.01158, audio_tagging_loss=0.008309, over 3043802.10 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:58:38,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2023-11-29 12:58:40,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3973540.0, ans=0.125 2023-11-29 12:58:54,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3973606.6666666665, ans=0.125 2023-11-29 12:58:56,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596050 2023-11-29 12:59:09,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3973673.3333333335, ans=0.125 2023-11-29 12:59:21,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3973740.0, ans=0.95 2023-11-29 12:59:24,851 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6900, loss[loss=0.05328, simple_loss=0.07622, pruned_loss=0.005892, audio_tagging_loss=0.009279, over 14866.00 frames. ], tot_loss[loss=0.06367, simple_loss=0.08758, pruned_loss=0.01143, audio_tagging_loss=0.008453, over 3053810.32 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:59:36,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3973873.3333333335, ans=0.125 2023-11-29 12:59:45,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-11-29 12:59:56,981 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:59:57,990 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596100 2023-11-29 12:59:58,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3973940.0, ans=0.125 2023-11-29 13:00:01,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3974006.6666666665, ans=0.125 2023-11-29 13:00:13,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.949e+01 9.796e+01 1.035e+02 1.230e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 13:00:13,537 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:00:13,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3974073.3333333335, ans=0.5 2023-11-29 13:00:16,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3974073.3333333335, ans=0.125 2023-11-29 13:00:17,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3974073.3333333335, ans=0.0 2023-11-29 13:00:21,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3974073.3333333335, ans=0.0 2023-11-29 13:00:25,767 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 6950, loss[loss=0.05509, simple_loss=0.06554, pruned_loss=0.008147, audio_tagging_loss=0.01417, over 15866.00 frames. ], tot_loss[loss=0.0636, simple_loss=0.08731, pruned_loss=0.01143, audio_tagging_loss=0.008514, over 3049413.94 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:00:34,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3974140.0, ans=0.0 2023-11-29 13:00:37,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-11-29 13:00:45,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-11-29 13:00:54,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-29 13:00:59,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596150 2023-11-29 13:01:27,346 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7000, loss[loss=0.07782, simple_loss=0.1094, pruned_loss=0.01655, audio_tagging_loss=0.006546, over 15215.00 frames. ], tot_loss[loss=0.06362, simple_loss=0.08742, pruned_loss=0.01136, audio_tagging_loss=0.00856, over 3045874.01 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:01:27,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3974473.3333333335, ans=0.125 2023-11-29 13:01:35,884 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:01:48,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3974540.0, ans=0.2 2023-11-29 13:01:49,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3974540.0, ans=0.125 2023-11-29 13:01:49,813 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:02:00,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3974606.6666666665, ans=0.0 2023-11-29 13:02:01,267 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596200 2023-11-29 13:02:11,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-29 13:02:16,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.861e+01 9.690e+01 1.060e+02 1.703e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 13:02:29,621 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7050, loss[loss=0.06254, simple_loss=0.08385, pruned_loss=0.01111, audio_tagging_loss=0.009498, over 16234.00 frames. ], tot_loss[loss=0.06362, simple_loss=0.08713, pruned_loss=0.01139, audio_tagging_loss=0.008662, over 3041026.59 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:02:35,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3974806.6666666665, ans=0.125 2023-11-29 13:02:43,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-29 13:03:02,683 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596250 2023-11-29 13:03:20,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3975073.3333333335, ans=0.2 2023-11-29 13:03:24,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3975073.3333333335, ans=0.05 2023-11-29 13:03:26,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3975073.3333333335, ans=0.2 2023-11-29 13:03:31,611 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7100, loss[loss=0.08423, simple_loss=0.1115, pruned_loss=0.01902, audio_tagging_loss=0.009444, over 14925.00 frames. ], tot_loss[loss=0.06366, simple_loss=0.08694, pruned_loss=0.01146, audio_tagging_loss=0.008727, over 3044025.44 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:03:37,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.31 vs. limit=10.0 2023-11-29 13:03:41,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3975140.0, ans=0.05 2023-11-29 13:03:47,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3975206.6666666665, ans=0.125 2023-11-29 13:03:52,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3975206.6666666665, ans=0.125 2023-11-29 13:03:53,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3975206.6666666665, ans=0.0 2023-11-29 13:04:05,067 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596300 2023-11-29 13:04:10,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-29 13:04:11,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3975340.0, ans=0.125 2023-11-29 13:04:14,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3975340.0, ans=0.0 2023-11-29 13:04:15,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-29 13:04:20,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.433e+01 1.002e+02 1.073e+02 1.406e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-29 13:04:27,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3975406.6666666665, ans=0.125 2023-11-29 13:04:32,922 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7150, loss[loss=0.06581, simple_loss=0.09737, pruned_loss=0.009007, audio_tagging_loss=0.008121, over 15497.00 frames. ], tot_loss[loss=0.06364, simple_loss=0.08684, pruned_loss=0.01138, audio_tagging_loss=0.008842, over 3039187.60 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:04:37,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3975473.3333333335, ans=0.0 2023-11-29 13:05:06,668 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596350 2023-11-29 13:05:18,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3975673.3333333335, ans=0.05 2023-11-29 13:05:34,662 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7200, loss[loss=0.0648, simple_loss=0.09896, pruned_loss=0.009838, audio_tagging_loss=0.00548, over 14624.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08804, pruned_loss=0.01167, audio_tagging_loss=0.008812, over 3038327.60 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:05:53,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.77 vs. limit=15.0 2023-11-29 13:06:08,354 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596400 2023-11-29 13:06:18,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3976006.6666666665, ans=0.125 2023-11-29 13:06:24,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.018e+01 9.297e+01 9.851e+01 1.057e+02 1.501e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:06:36,944 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7250, loss[loss=0.0819, simple_loss=0.1119, pruned_loss=0.01908, audio_tagging_loss=0.006854, over 14890.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08841, pruned_loss=0.01178, audio_tagging_loss=0.008827, over 3039752.39 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:06:40,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2023-11-29 13:06:49,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3976206.6666666665, ans=0.025 2023-11-29 13:06:56,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-29 13:07:07,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3976273.3333333335, ans=0.0 2023-11-29 13:07:09,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596450 2023-11-29 13:07:27,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3976406.6666666665, ans=0.2 2023-11-29 13:07:35,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3976406.6666666665, ans=0.1 2023-11-29 13:07:38,441 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7300, loss[loss=0.0627, simple_loss=0.08909, pruned_loss=0.009119, audio_tagging_loss=0.009037, over 15722.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.0882, pruned_loss=0.0117, audio_tagging_loss=0.008729, over 3040638.97 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:07:43,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-29 13:08:04,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=22.5 2023-11-29 13:08:10,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2023-11-29 13:08:11,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596500 2023-11-29 13:08:28,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.133e+01 9.755e+01 1.029e+02 1.413e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 13:08:39,150 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7350, loss[loss=0.07098, simple_loss=0.1086, pruned_loss=0.0115, audio_tagging_loss=0.00517, over 15414.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08831, pruned_loss=0.01162, audio_tagging_loss=0.008614, over 3043269.15 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:08:43,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3976806.6666666665, ans=0.125 2023-11-29 13:08:51,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3976873.3333333335, ans=0.125 2023-11-29 13:08:52,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-29 13:08:57,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3976873.3333333335, ans=0.07 2023-11-29 13:09:09,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3976940.0, ans=0.2 2023-11-29 13:09:11,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=22.5 2023-11-29 13:09:13,181 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596550 2023-11-29 13:09:15,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3977006.6666666665, ans=0.1 2023-11-29 13:09:20,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3977006.6666666665, ans=0.125 2023-11-29 13:09:31,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=22.5 2023-11-29 13:09:40,656 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7400, loss[loss=0.0542, simple_loss=0.07558, pruned_loss=0.007293, audio_tagging_loss=0.009114, over 14783.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08854, pruned_loss=0.01159, audio_tagging_loss=0.00854, over 3045262.40 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:09:40,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3977140.0, ans=0.2 2023-11-29 13:09:51,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3977140.0, ans=0.2 2023-11-29 13:10:03,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3977206.6666666665, ans=0.125 2023-11-29 13:10:06,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3977273.3333333335, ans=0.0 2023-11-29 13:10:13,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596600 2023-11-29 13:10:31,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.135e+01 9.143e+01 9.729e+01 1.063e+02 1.656e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 13:10:37,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3977406.6666666665, ans=0.125 2023-11-29 13:10:43,561 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7450, loss[loss=0.05343, simple_loss=0.067, pruned_loss=0.01224, audio_tagging_loss=0.007687, over 15168.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08985, pruned_loss=0.01203, audio_tagging_loss=0.008416, over 3045239.04 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:10:56,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3977540.0, ans=0.035 2023-11-29 13:10:56,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3977540.0, ans=0.0 2023-11-29 13:11:16,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596650 2023-11-29 13:11:26,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3977673.3333333335, ans=0.0 2023-11-29 13:11:44,377 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7500, loss[loss=0.05783, simple_loss=0.08219, pruned_loss=0.008441, audio_tagging_loss=0.008297, over 16214.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08981, pruned_loss=0.01213, audio_tagging_loss=0.008362, over 3045500.95 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:11:52,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3977806.6666666665, ans=0.1 2023-11-29 13:11:58,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3977873.3333333335, ans=0.125 2023-11-29 13:12:10,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3977940.0, ans=0.0 2023-11-29 13:12:18,362 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596700 2023-11-29 13:12:25,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3978006.6666666665, ans=0.125 2023-11-29 13:12:34,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.233e+01 9.870e+01 1.058e+02 1.396e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 13:12:43,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3978073.3333333335, ans=0.0 2023-11-29 13:12:45,791 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7550, loss[loss=0.06734, simple_loss=0.0959, pruned_loss=0.009532, audio_tagging_loss=0.009858, over 15181.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08839, pruned_loss=0.0119, audio_tagging_loss=0.008476, over 3042981.31 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:13:16,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3978273.3333333335, ans=0.125 2023-11-29 13:13:18,392 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596750 2023-11-29 13:13:24,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2023-11-29 13:13:25,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3978340.0, ans=0.125 2023-11-29 13:13:27,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3978340.0, ans=0.09899494936611666 2023-11-29 13:13:28,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3978340.0, ans=0.0 2023-11-29 13:13:48,034 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7600, loss[loss=0.08242, simple_loss=0.1169, pruned_loss=0.01633, audio_tagging_loss=0.007654, over 15815.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08852, pruned_loss=0.01197, audio_tagging_loss=0.008465, over 3042108.84 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:13:57,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3978473.3333333335, ans=0.1 2023-11-29 13:13:57,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3978473.3333333335, ans=0.125 2023-11-29 13:14:04,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3978540.0, ans=0.125 2023-11-29 13:14:11,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3978606.6666666665, ans=22.5 2023-11-29 13:14:18,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3978606.6666666665, ans=0.125 2023-11-29 13:14:19,847 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596800 2023-11-29 13:14:21,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3978606.6666666665, ans=0.09899494936611666 2023-11-29 13:14:27,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3978673.3333333335, ans=0.0 2023-11-29 13:14:37,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3978740.0, ans=0.0 2023-11-29 13:14:37,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.734e+01 9.698e+01 1.076e+02 1.664e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 13:14:48,322 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7650, loss[loss=0.06031, simple_loss=0.07872, pruned_loss=0.0105, audio_tagging_loss=0.01045, over 15445.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08831, pruned_loss=0.01185, audio_tagging_loss=0.00841, over 3047994.09 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:14:54,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2023-11-29 13:14:58,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3978806.6666666665, ans=0.2 2023-11-29 13:15:21,920 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596850 2023-11-29 13:15:38,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3979073.3333333335, ans=0.2 2023-11-29 13:15:49,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3979140.0, ans=0.125 2023-11-29 13:15:49,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-29 13:15:50,309 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7700, loss[loss=0.06457, simple_loss=0.07869, pruned_loss=0.01671, audio_tagging_loss=0.008516, over 14762.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.0879, pruned_loss=0.01185, audio_tagging_loss=0.008501, over 3046988.15 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:15:50,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3979140.0, ans=0.0 2023-11-29 13:15:52,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-29 13:16:23,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-29 13:16:24,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596900 2023-11-29 13:16:40,895 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.160e+01 9.812e+01 1.042e+02 1.331e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 13:16:52,043 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7750, loss[loss=0.0638, simple_loss=0.08624, pruned_loss=0.01177, audio_tagging_loss=0.008906, over 14196.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08813, pruned_loss=0.01186, audio_tagging_loss=0.008546, over 3045630.71 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:16:52,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.41 vs. limit=10.0 2023-11-29 13:17:04,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-29 13:17:14,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3979540.0, ans=0.0 2023-11-29 13:17:25,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 596950 2023-11-29 13:17:48,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3979740.0, ans=0.125 2023-11-29 13:17:54,650 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7800, loss[loss=0.07129, simple_loss=0.09976, pruned_loss=0.01485, audio_tagging_loss=0.006554, over 15413.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08807, pruned_loss=0.01177, audio_tagging_loss=0.00855, over 3044465.46 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:18:28,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597000 2023-11-29 13:18:31,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3980006.6666666665, ans=0.125 2023-11-29 13:18:35,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3980006.6666666665, ans=0.0 2023-11-29 13:18:46,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.044e+01 9.654e+01 1.045e+02 1.431e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 13:18:57,703 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7850, loss[loss=0.07088, simple_loss=0.1025, pruned_loss=0.01252, audio_tagging_loss=0.007099, over 16364.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08921, pruned_loss=0.01195, audio_tagging_loss=0.008522, over 3044076.81 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:19:21,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-11-29 13:19:31,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597050 2023-11-29 13:19:43,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3980340.0, ans=0.1 2023-11-29 13:19:59,728 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7900, loss[loss=0.06336, simple_loss=0.08295, pruned_loss=0.01151, audio_tagging_loss=0.01038, over 14870.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08929, pruned_loss=0.01195, audio_tagging_loss=0.008541, over 3052533.62 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:20:02,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-29 13:20:19,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3980540.0, ans=0.125 2023-11-29 13:20:29,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3980606.6666666665, ans=0.0 2023-11-29 13:20:32,860 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597100 2023-11-29 13:20:34,133 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:20:51,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-11-29 13:20:52,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.084e+01 9.995e+01 1.068e+02 1.283e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:20:54,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3980740.0, ans=0.2 2023-11-29 13:21:01,676 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 7950, loss[loss=0.05066, simple_loss=0.06433, pruned_loss=0.008296, audio_tagging_loss=0.0102, over 14879.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08946, pruned_loss=0.01192, audio_tagging_loss=0.008583, over 3049745.04 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:21:20,007 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:21:31,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3980940.0, ans=0.0 2023-11-29 13:21:34,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3980940.0, ans=0.125 2023-11-29 13:21:35,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597150 2023-11-29 13:21:50,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2023-11-29 13:22:00,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3981073.3333333335, ans=0.125 2023-11-29 13:22:04,439 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8000, loss[loss=0.06436, simple_loss=0.08554, pruned_loss=0.01258, audio_tagging_loss=0.009016, over 15653.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08885, pruned_loss=0.01174, audio_tagging_loss=0.008718, over 3044419.54 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:22:37,410 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597200 2023-11-29 13:22:42,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3981340.0, ans=0.125 2023-11-29 13:22:57,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.831e+01 9.452e+01 1.032e+02 1.267e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 13:23:06,824 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8050, loss[loss=0.06334, simple_loss=0.08746, pruned_loss=0.01121, audio_tagging_loss=0.008408, over 15186.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08914, pruned_loss=0.01184, audio_tagging_loss=0.008757, over 3043686.97 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:23:15,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3981473.3333333335, ans=0.0 2023-11-29 13:23:17,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3981473.3333333335, ans=0.125 2023-11-29 13:23:18,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-29 13:23:19,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3981540.0, ans=0.2 2023-11-29 13:23:39,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3981606.6666666665, ans=0.125 2023-11-29 13:23:40,530 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597250 2023-11-29 13:23:49,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3981673.3333333335, ans=0.05 2023-11-29 13:24:03,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-29 13:24:08,357 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8100, loss[loss=0.05504, simple_loss=0.07304, pruned_loss=0.008946, audio_tagging_loss=0.00958, over 14279.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08951, pruned_loss=0.01196, audio_tagging_loss=0.008659, over 3040206.90 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:24:08,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3981806.6666666665, ans=0.125 2023-11-29 13:24:14,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-11-29 13:24:19,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3981873.3333333335, ans=0.125 2023-11-29 13:24:41,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597300 2023-11-29 13:24:54,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3982006.6666666665, ans=0.04949747468305833 2023-11-29 13:24:59,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.024e+01 9.571e+01 1.071e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 13:25:08,648 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8150, loss[loss=0.08073, simple_loss=0.1096, pruned_loss=0.01959, audio_tagging_loss=0.006327, over 15038.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08927, pruned_loss=0.01203, audio_tagging_loss=0.008585, over 3036095.38 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:25:26,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3982206.6666666665, ans=0.125 2023-11-29 13:25:33,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3982273.3333333335, ans=0.125 2023-11-29 13:25:41,897 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597350 2023-11-29 13:25:51,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3982340.0, ans=0.95 2023-11-29 13:25:55,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3982340.0, ans=0.0 2023-11-29 13:25:55,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-29 13:25:56,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3982406.6666666665, ans=0.0 2023-11-29 13:26:10,390 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8200, loss[loss=0.06268, simple_loss=0.08411, pruned_loss=0.01245, audio_tagging_loss=0.008182, over 15988.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08878, pruned_loss=0.01189, audio_tagging_loss=0.008487, over 3039385.95 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:26:13,908 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:26:40,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3982606.6666666665, ans=0.0 2023-11-29 13:26:43,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597400 2023-11-29 13:26:48,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3982673.3333333335, ans=0.0 2023-11-29 13:26:50,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3982673.3333333335, ans=0.0 2023-11-29 13:27:03,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.247e+01 9.743e+01 1.041e+02 1.327e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 13:27:05,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3982740.0, ans=0.125 2023-11-29 13:27:09,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3982740.0, ans=0.125 2023-11-29 13:27:10,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-11-29 13:27:12,561 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8250, loss[loss=0.05055, simple_loss=0.06753, pruned_loss=0.007103, audio_tagging_loss=0.009685, over 14866.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08828, pruned_loss=0.01182, audio_tagging_loss=0.008499, over 3041046.41 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:27:23,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3982873.3333333335, ans=0.0 2023-11-29 13:27:27,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3982873.3333333335, ans=0.125 2023-11-29 13:27:29,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3982873.3333333335, ans=0.0 2023-11-29 13:27:45,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597450 2023-11-29 13:27:48,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3983006.6666666665, ans=0.125 2023-11-29 13:27:51,793 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:27:56,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.14 vs. limit=15.0 2023-11-29 13:27:56,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3983006.6666666665, ans=0.02 2023-11-29 13:28:07,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3983073.3333333335, ans=0.1 2023-11-29 13:28:12,930 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8300, loss[loss=0.09585, simple_loss=0.1236, pruned_loss=0.02564, audio_tagging_loss=0.008423, over 15701.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08915, pruned_loss=0.012, audio_tagging_loss=0.008458, over 3042024.18 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:28:35,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3983206.6666666665, ans=0.0 2023-11-29 13:28:46,608 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597500 2023-11-29 13:28:58,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3983340.0, ans=0.125 2023-11-29 13:29:00,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3983406.6666666665, ans=0.2 2023-11-29 13:29:05,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.254e+01 9.916e+01 1.057e+02 1.310e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 13:29:06,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3983406.6666666665, ans=0.125 2023-11-29 13:29:08,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3983406.6666666665, ans=0.125 2023-11-29 13:29:14,567 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8350, loss[loss=0.05316, simple_loss=0.07258, pruned_loss=0.009583, audio_tagging_loss=0.007285, over 14505.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08947, pruned_loss=0.01195, audio_tagging_loss=0.008411, over 3039402.47 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:29:22,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3983473.3333333335, ans=0.2 2023-11-29 13:29:29,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3983540.0, ans=0.1 2023-11-29 13:29:31,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-29 13:29:47,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597550 2023-11-29 13:29:47,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-29 13:29:49,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3983606.6666666665, ans=0.2 2023-11-29 13:29:50,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3983673.3333333335, ans=0.125 2023-11-29 13:30:16,370 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8400, loss[loss=0.06455, simple_loss=0.08272, pruned_loss=0.01578, audio_tagging_loss=0.007404, over 13268.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08829, pruned_loss=0.01178, audio_tagging_loss=0.008434, over 3041572.86 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:30:26,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3983806.6666666665, ans=0.125 2023-11-29 13:30:30,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3983873.3333333335, ans=0.0 2023-11-29 13:30:33,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-11-29 13:30:34,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3983873.3333333335, ans=0.07 2023-11-29 13:30:34,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3983873.3333333335, ans=0.1 2023-11-29 13:30:43,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3983940.0, ans=0.125 2023-11-29 13:30:44,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3983940.0, ans=0.125 2023-11-29 13:30:49,609 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597600 2023-11-29 13:30:58,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3984006.6666666665, ans=0.125 2023-11-29 13:31:10,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.074e+01 9.912e+01 1.068e+02 1.273e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 13:31:17,385 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8450, loss[loss=0.07796, simple_loss=0.1137, pruned_loss=0.0141, audio_tagging_loss=0.006996, over 15702.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08826, pruned_loss=0.01181, audio_tagging_loss=0.00844, over 3043416.95 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:31:28,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3984206.6666666665, ans=0.125 2023-11-29 13:31:50,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3984273.3333333335, ans=0.0 2023-11-29 13:31:51,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597650 2023-11-29 13:32:10,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3984406.6666666665, ans=0.125 2023-11-29 13:32:18,963 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8500, loss[loss=0.08138, simple_loss=0.1087, pruned_loss=0.02052, audio_tagging_loss=0.006521, over 14261.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08934, pruned_loss=0.0121, audio_tagging_loss=0.008399, over 3050360.73 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:32:24,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-29 13:32:25,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3984473.3333333335, ans=0.0 2023-11-29 13:32:37,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3984540.0, ans=0.1 2023-11-29 13:32:53,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597700 2023-11-29 13:33:04,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3984673.3333333335, ans=0.0 2023-11-29 13:33:13,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 8.994e+01 9.750e+01 1.041e+02 1.321e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 13:33:19,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3984740.0, ans=0.2 2023-11-29 13:33:21,348 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8550, loss[loss=0.07671, simple_loss=0.1019, pruned_loss=0.01602, audio_tagging_loss=0.00974, over 15671.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08908, pruned_loss=0.0119, audio_tagging_loss=0.008361, over 3055072.73 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:33:24,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=22.5 2023-11-29 13:33:38,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3984873.3333333335, ans=0.025 2023-11-29 13:33:44,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3984873.3333333335, ans=0.0 2023-11-29 13:33:54,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597750 2023-11-29 13:33:54,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3984940.0, ans=0.025 2023-11-29 13:34:18,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3985073.3333333335, ans=0.0 2023-11-29 13:34:22,926 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8600, loss[loss=0.06919, simple_loss=0.09288, pruned_loss=0.01422, audio_tagging_loss=0.008532, over 15915.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.0895, pruned_loss=0.01193, audio_tagging_loss=0.008372, over 3053903.52 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:34:26,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3985140.0, ans=0.07 2023-11-29 13:34:28,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3985140.0, ans=0.035 2023-11-29 13:34:57,433 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597800 2023-11-29 13:35:14,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3985406.6666666665, ans=15.0 2023-11-29 13:35:18,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.848e+01 9.575e+01 1.022e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 13:35:25,293 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8650, loss[loss=0.0714, simple_loss=0.1015, pruned_loss=0.01329, audio_tagging_loss=0.007355, over 14722.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.0898, pruned_loss=0.01203, audio_tagging_loss=0.008442, over 3055585.80 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:35:26,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3985473.3333333335, ans=0.0 2023-11-29 13:35:36,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-29 13:35:55,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3985606.6666666665, ans=0.1 2023-11-29 13:35:58,854 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597850 2023-11-29 13:36:12,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3985673.3333333335, ans=0.1 2023-11-29 13:36:17,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3985740.0, ans=0.125 2023-11-29 13:36:24,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3985740.0, ans=0.125 2023-11-29 13:36:27,076 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8700, loss[loss=0.04706, simple_loss=0.05044, pruned_loss=0.008749, audio_tagging_loss=0.01309, over 15347.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08897, pruned_loss=0.01194, audio_tagging_loss=0.008668, over 3050170.48 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:36:47,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3985873.3333333335, ans=0.1 2023-11-29 13:36:49,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3985873.3333333335, ans=0.125 2023-11-29 13:36:49,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-29 13:36:51,880 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:36:52,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-11-29 13:36:59,824 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597900 2023-11-29 13:37:21,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 9.317e+01 9.883e+01 1.072e+02 1.210e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 13:37:28,336 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8750, loss[loss=0.1001, simple_loss=0.141, pruned_loss=0.02345, audio_tagging_loss=0.006155, over 14902.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08932, pruned_loss=0.0121, audio_tagging_loss=0.008718, over 3050933.81 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:37:39,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3986206.6666666665, ans=0.05 2023-11-29 13:37:53,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3986273.3333333335, ans=0.035 2023-11-29 13:37:58,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3986273.3333333335, ans=0.0 2023-11-29 13:38:01,095 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 597950 2023-11-29 13:38:28,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3986473.3333333335, ans=0.125 2023-11-29 13:38:29,610 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8800, loss[loss=0.05929, simple_loss=0.07697, pruned_loss=0.01002, audio_tagging_loss=0.01079, over 14687.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09029, pruned_loss=0.01229, audio_tagging_loss=0.008788, over 3049600.02 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:38:39,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3986473.3333333335, ans=0.09899494936611666 2023-11-29 13:38:43,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3986540.0, ans=0.125 2023-11-29 13:38:43,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3986540.0, ans=0.125 2023-11-29 13:38:48,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3986540.0, ans=0.125 2023-11-29 13:38:56,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3986606.6666666665, ans=0.0 2023-11-29 13:39:00,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3986606.6666666665, ans=0.0 2023-11-29 13:39:02,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598000 2023-11-29 13:39:23,606 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.465e+01 1.025e+02 1.121e+02 1.304e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-29 13:39:31,205 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8850, loss[loss=0.05266, simple_loss=0.07303, pruned_loss=0.006196, audio_tagging_loss=0.009953, over 14720.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08972, pruned_loss=0.01205, audio_tagging_loss=0.008782, over 3059469.51 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:39:36,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3986806.6666666665, ans=0.1 2023-11-29 13:39:46,113 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:39:48,576 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:39:49,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3986873.3333333335, ans=0.125 2023-11-29 13:39:49,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3986873.3333333335, ans=0.125 2023-11-29 13:40:04,273 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598050 2023-11-29 13:40:12,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3987006.6666666665, ans=0.125 2023-11-29 13:40:15,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-29 13:40:19,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-29 13:40:26,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3987073.3333333335, ans=0.125 2023-11-29 13:40:32,892 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8900, loss[loss=0.05814, simple_loss=0.08173, pruned_loss=0.009467, audio_tagging_loss=0.007809, over 15226.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09056, pruned_loss=0.0121, audio_tagging_loss=0.008686, over 3056672.21 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:40:47,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3987206.6666666665, ans=0.09899494936611666 2023-11-29 13:40:48,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=22.5 2023-11-29 13:40:50,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3987206.6666666665, ans=0.07 2023-11-29 13:41:04,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3987273.3333333335, ans=0.07 2023-11-29 13:41:05,589 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598100 2023-11-29 13:41:17,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3987340.0, ans=0.035 2023-11-29 13:41:23,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3987406.6666666665, ans=0.125 2023-11-29 13:41:26,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.187e+01 9.849e+01 1.048e+02 1.202e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:41:30,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3987406.6666666665, ans=0.05 2023-11-29 13:41:30,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3987406.6666666665, ans=0.125 2023-11-29 13:41:34,323 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 8950, loss[loss=0.03784, simple_loss=0.04868, pruned_loss=0.00437, audio_tagging_loss=0.009125, over 15578.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08972, pruned_loss=0.01191, audio_tagging_loss=0.00854, over 3060483.70 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:41:35,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3987473.3333333335, ans=0.0 2023-11-29 13:41:39,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3987473.3333333335, ans=0.0 2023-11-29 13:41:44,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3987540.0, ans=0.125 2023-11-29 13:41:51,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3987540.0, ans=0.125 2023-11-29 13:41:57,694 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:42:02,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3987606.6666666665, ans=0.125 2023-11-29 13:42:07,434 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598150 2023-11-29 13:42:15,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3987673.3333333335, ans=0.2 2023-11-29 13:42:35,626 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9000, loss[loss=0.05837, simple_loss=0.08126, pruned_loss=0.009697, audio_tagging_loss=0.008039, over 15524.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08929, pruned_loss=0.01182, audio_tagging_loss=0.008458, over 3061556.07 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:42:35,627 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 13:42:55,100 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3609, 5.9350, 6.2764, 5.7462], device='cuda:3') 2023-11-29 13:43:16,185 INFO [train_asr.py:1267] (3/4) Epoch 50, validation: loss=0.05899, simple_loss=0.05036, pruned_loss=0.005383, audio_tagging_loss=0.02843, over 4681554.00 frames. 2023-11-29 13:43:16,186 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 13:43:29,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3987873.3333333335, ans=0.0 2023-11-29 13:43:49,511 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598200 2023-11-29 13:43:52,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3988006.6666666665, ans=0.2 2023-11-29 13:44:00,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3988006.6666666665, ans=0.2 2023-11-29 13:44:12,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 9.277e+01 9.847e+01 1.044e+02 1.354e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 13:44:18,118 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9050, loss[loss=0.05388, simple_loss=0.0736, pruned_loss=0.008733, audio_tagging_loss=0.008346, over 15852.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.0893, pruned_loss=0.01167, audio_tagging_loss=0.008357, over 3060873.10 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:44:50,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598250 2023-11-29 13:45:20,160 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9100, loss[loss=0.07694, simple_loss=0.1196, pruned_loss=0.01171, audio_tagging_loss=0.005419, over 16417.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.089, pruned_loss=0.01151, audio_tagging_loss=0.008321, over 3061930.59 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:45:30,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3988473.3333333335, ans=0.2 2023-11-29 13:45:54,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598300 2023-11-29 13:46:16,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.247e+01 9.830e+01 1.081e+02 1.321e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 13:46:22,633 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9150, loss[loss=0.05945, simple_loss=0.08296, pruned_loss=0.01074, audio_tagging_loss=0.00723, over 14983.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08973, pruned_loss=0.01153, audio_tagging_loss=0.008305, over 3059427.98 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:46:56,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598350 2023-11-29 13:47:25,248 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9200, loss[loss=0.04968, simple_loss=0.06949, pruned_loss=0.00749, audio_tagging_loss=0.007449, over 14782.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08912, pruned_loss=0.01145, audio_tagging_loss=0.008277, over 3052863.35 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:47:35,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3989206.6666666665, ans=0.1 2023-11-29 13:47:53,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3989273.3333333335, ans=0.0 2023-11-29 13:47:57,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598400 2023-11-29 13:48:10,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3989340.0, ans=0.0 2023-11-29 13:48:13,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-29 13:48:21,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.841e+01 9.563e+01 1.025e+02 1.500e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 13:48:25,992 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9250, loss[loss=0.06584, simple_loss=0.094, pruned_loss=0.01227, audio_tagging_loss=0.006565, over 15609.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08883, pruned_loss=0.01154, audio_tagging_loss=0.008328, over 3043631.41 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:48:29,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3989473.3333333335, ans=0.125 2023-11-29 13:48:33,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3989473.3333333335, ans=0.0 2023-11-29 13:48:38,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3989540.0, ans=0.125 2023-11-29 13:48:48,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3989540.0, ans=0.1 2023-11-29 13:48:57,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3989606.6666666665, ans=0.125 2023-11-29 13:49:00,757 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598450 2023-11-29 13:49:10,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-29 13:49:16,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3989740.0, ans=0.1 2023-11-29 13:49:19,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3989740.0, ans=0.2 2023-11-29 13:49:21,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:24,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:24,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:25,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3989740.0, ans=0.125 2023-11-29 13:49:28,974 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9300, loss[loss=0.08092, simple_loss=0.1158, pruned_loss=0.01798, audio_tagging_loss=0.005034, over 15321.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08912, pruned_loss=0.01149, audio_tagging_loss=0.008295, over 3047482.33 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:49:44,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-29 13:49:46,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-29 13:49:54,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3989940.0, ans=0.125 2023-11-29 13:49:56,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3989940.0, ans=0.125 2023-11-29 13:49:58,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3989940.0, ans=0.0 2023-11-29 13:50:02,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598500 2023-11-29 13:50:06,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3990006.6666666665, ans=0.1 2023-11-29 13:50:09,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3990006.6666666665, ans=0.04949747468305833 2023-11-29 13:50:21,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=12.0 2023-11-29 13:50:25,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 9.186e+01 9.880e+01 1.044e+02 1.365e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 13:50:27,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3990073.3333333335, ans=0.2 2023-11-29 13:50:30,894 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9350, loss[loss=0.06045, simple_loss=0.08013, pruned_loss=0.009605, audio_tagging_loss=0.01078, over 15219.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08877, pruned_loss=0.0115, audio_tagging_loss=0.008347, over 3041796.18 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:50:36,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3990140.0, ans=0.0 2023-11-29 13:50:53,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3990206.6666666665, ans=0.125 2023-11-29 13:51:04,919 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598550 2023-11-29 13:51:19,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3990406.6666666665, ans=0.0 2023-11-29 13:51:28,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-29 13:51:33,511 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9400, loss[loss=0.07836, simple_loss=0.1079, pruned_loss=0.01528, audio_tagging_loss=0.009126, over 14441.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08961, pruned_loss=0.01161, audio_tagging_loss=0.008443, over 3045109.83 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:51:44,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-29 13:52:06,956 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598600 2023-11-29 13:52:17,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3990673.3333333335, ans=0.1 2023-11-29 13:52:31,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 9.140e+01 9.680e+01 1.042e+02 1.691e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 13:52:35,709 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9450, loss[loss=0.07529, simple_loss=0.103, pruned_loss=0.01653, audio_tagging_loss=0.007246, over 14939.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08956, pruned_loss=0.01165, audio_tagging_loss=0.008535, over 3046887.46 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:52:36,903 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:52:42,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-29 13:52:49,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=22.5 2023-11-29 13:53:08,773 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598650 2023-11-29 13:53:09,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=12.0 2023-11-29 13:53:19,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3991006.6666666665, ans=0.0 2023-11-29 13:53:37,318 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9500, loss[loss=0.05618, simple_loss=0.07322, pruned_loss=0.007605, audio_tagging_loss=0.01196, over 15618.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08955, pruned_loss=0.01158, audio_tagging_loss=0.008622, over 3047705.22 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:53:39,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3991140.0, ans=0.125 2023-11-29 13:53:48,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3991206.6666666665, ans=0.125 2023-11-29 13:53:52,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3991206.6666666665, ans=0.1 2023-11-29 13:54:09,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3991273.3333333335, ans=0.125 2023-11-29 13:54:10,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598700 2023-11-29 13:54:19,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-11-29 13:54:33,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.161e+01 9.773e+01 1.049e+02 1.216e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 13:54:35,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3991406.6666666665, ans=0.125 2023-11-29 13:54:38,474 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9550, loss[loss=0.08154, simple_loss=0.1079, pruned_loss=0.0179, audio_tagging_loss=0.009673, over 15618.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08931, pruned_loss=0.01169, audio_tagging_loss=0.008624, over 3040970.56 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:54:56,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3991540.0, ans=0.125 2023-11-29 13:55:11,981 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598750 2023-11-29 13:55:16,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3991673.3333333335, ans=0.125 2023-11-29 13:55:40,321 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9600, loss[loss=0.07527, simple_loss=0.1069, pruned_loss=0.01333, audio_tagging_loss=0.00848, over 14603.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08818, pruned_loss=0.01161, audio_tagging_loss=0.008767, over 3038894.30 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:56:02,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-29 13:56:12,508 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598800 2023-11-29 13:56:14,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3991940.0, ans=0.125 2023-11-29 13:56:27,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3992006.6666666665, ans=0.0 2023-11-29 13:56:37,808 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 9.142e+01 9.552e+01 1.023e+02 1.315e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-29 13:56:39,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3992073.3333333335, ans=0.125 2023-11-29 13:56:41,288 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9650, loss[loss=0.07505, simple_loss=0.1022, pruned_loss=0.01285, audio_tagging_loss=0.01109, over 15371.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08826, pruned_loss=0.01157, audio_tagging_loss=0.008714, over 3043103.46 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:56:49,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-11-29 13:57:15,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598850 2023-11-29 13:57:22,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3992340.0, ans=0.125 2023-11-29 13:57:25,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992340.0, ans=0.1 2023-11-29 13:57:37,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3992406.6666666665, ans=0.0 2023-11-29 13:57:42,309 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9700, loss[loss=0.07711, simple_loss=0.109, pruned_loss=0.01725, audio_tagging_loss=0.005346, over 14930.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0888, pruned_loss=0.0118, audio_tagging_loss=0.008637, over 3048496.64 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:58:09,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3992606.6666666665, ans=0.125 2023-11-29 13:58:11,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3992606.6666666665, ans=0.125 2023-11-29 13:58:11,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-29 13:58:15,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598900 2023-11-29 13:58:19,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-29 13:58:22,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3992673.3333333335, ans=0.0 2023-11-29 13:58:27,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3992673.3333333335, ans=0.125 2023-11-29 13:58:28,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992673.3333333335, ans=0.1 2023-11-29 13:58:37,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3992740.0, ans=0.025 2023-11-29 13:58:41,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.286e+01 9.993e+01 1.057e+02 1.299e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:58:44,671 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9750, loss[loss=0.05637, simple_loss=0.07695, pruned_loss=0.007816, audio_tagging_loss=0.01007, over 15457.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08904, pruned_loss=0.0118, audio_tagging_loss=0.008579, over 3042356.51 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:58:46,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3992806.6666666665, ans=0.04949747468305833 2023-11-29 13:58:47,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2023-11-29 13:58:56,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3992873.3333333335, ans=0.95 2023-11-29 13:59:03,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992873.3333333335, ans=0.1 2023-11-29 13:59:16,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 598950 2023-11-29 13:59:18,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3992940.0, ans=6.0 2023-11-29 13:59:29,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3993006.6666666665, ans=0.0 2023-11-29 13:59:42,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=12.0 2023-11-29 13:59:44,650 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9800, loss[loss=0.0565, simple_loss=0.07507, pruned_loss=0.01196, audio_tagging_loss=0.007008, over 15028.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.0888, pruned_loss=0.01175, audio_tagging_loss=0.008427, over 3034898.35 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:00:18,747 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599000 2023-11-29 14:00:40,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3993406.6666666665, ans=0.0 2023-11-29 14:00:40,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3993406.6666666665, ans=0.125 2023-11-29 14:00:43,058 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:00:44,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.016e+01 9.640e+01 1.052e+02 1.257e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 14:00:46,481 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9850, loss[loss=0.06122, simple_loss=0.08622, pruned_loss=0.009698, audio_tagging_loss=0.00841, over 15705.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08921, pruned_loss=0.01193, audio_tagging_loss=0.008441, over 3034552.92 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:01:01,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3993540.0, ans=0.04949747468305833 2023-11-29 14:01:05,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3993540.0, ans=10.0 2023-11-29 14:01:17,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3993606.6666666665, ans=0.2 2023-11-29 14:01:20,026 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599050 2023-11-29 14:01:22,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-11-29 14:01:47,695 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9900, loss[loss=0.06989, simple_loss=0.09114, pruned_loss=0.01367, audio_tagging_loss=0.01065, over 15549.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08994, pruned_loss=0.01201, audio_tagging_loss=0.008327, over 3032666.45 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:01:54,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=22.5 2023-11-29 14:02:07,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3993873.3333333335, ans=0.2 2023-11-29 14:02:16,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3993940.0, ans=0.125 2023-11-29 14:02:20,711 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599100 2023-11-29 14:02:34,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3994006.6666666665, ans=0.2 2023-11-29 14:02:34,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3994006.6666666665, ans=0.125 2023-11-29 14:02:42,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3994073.3333333335, ans=0.2 2023-11-29 14:02:46,877 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 9.233e+01 9.702e+01 1.026e+02 1.352e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 14:02:49,341 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 9950, loss[loss=0.05167, simple_loss=0.07524, pruned_loss=0.005969, audio_tagging_loss=0.008086, over 15592.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08936, pruned_loss=0.01194, audio_tagging_loss=0.008357, over 3037494.86 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:03:03,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2023-11-29 14:03:06,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2023-11-29 14:03:22,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599150 2023-11-29 14:03:33,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3994340.0, ans=0.0 2023-11-29 14:03:40,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3994406.6666666665, ans=0.0 2023-11-29 14:03:51,255 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10000, loss[loss=0.06652, simple_loss=0.09309, pruned_loss=0.01092, audio_tagging_loss=0.009055, over 14470.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08967, pruned_loss=0.01192, audio_tagging_loss=0.008343, over 3037824.22 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:03:56,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3994473.3333333335, ans=0.1 2023-11-29 14:03:58,462 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:04:03,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3994540.0, ans=0.125 2023-11-29 14:04:04,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3994540.0, ans=0.125 2023-11-29 14:04:24,908 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599200 2023-11-29 14:04:47,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3994740.0, ans=0.125 2023-11-29 14:04:48,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3994740.0, ans=0.125 2023-11-29 14:04:50,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.059e+01 9.244e+01 9.877e+01 1.049e+02 1.463e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 14:04:53,646 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10050, loss[loss=0.06057, simple_loss=0.08668, pruned_loss=0.009046, audio_tagging_loss=0.008181, over 14675.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08936, pruned_loss=0.01172, audio_tagging_loss=0.008395, over 3035020.48 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:05:13,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3994873.3333333335, ans=0.0 2023-11-29 14:05:27,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599250 2023-11-29 14:05:35,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3995006.6666666665, ans=0.04949747468305833 2023-11-29 14:05:35,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3995006.6666666665, ans=0.0 2023-11-29 14:05:50,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3995073.3333333335, ans=0.09899494936611666 2023-11-29 14:05:56,146 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10100, loss[loss=0.07413, simple_loss=0.1004, pruned_loss=0.01188, audio_tagging_loss=0.01205, over 14717.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08977, pruned_loss=0.01178, audio_tagging_loss=0.008453, over 3038228.66 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:06:02,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3995140.0, ans=0.1 2023-11-29 14:06:10,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3995206.6666666665, ans=0.125 2023-11-29 14:06:12,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3995206.6666666665, ans=0.125 2023-11-29 14:06:18,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3995206.6666666665, ans=0.07 2023-11-29 14:06:19,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3995273.3333333335, ans=0.125 2023-11-29 14:06:28,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2023-11-29 14:06:29,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599300 2023-11-29 14:06:44,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3995406.6666666665, ans=0.125 2023-11-29 14:06:48,871 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:06:53,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-29 14:06:54,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.067e+01 9.808e+01 1.052e+02 1.257e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:06:55,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-29 14:06:57,696 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10150, loss[loss=0.06263, simple_loss=0.09388, pruned_loss=0.007532, audio_tagging_loss=0.008157, over 15843.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08883, pruned_loss=0.0118, audio_tagging_loss=0.008495, over 3034613.13 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:07:29,261 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:07:31,191 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599350 2023-11-29 14:07:32,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3995606.6666666665, ans=0.125 2023-11-29 14:07:33,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3995673.3333333335, ans=0.125 2023-11-29 14:07:54,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3995740.0, ans=0.125 2023-11-29 14:07:54,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-29 14:07:58,623 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10200, loss[loss=0.06312, simple_loss=0.08314, pruned_loss=0.01291, audio_tagging_loss=0.008638, over 14880.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08839, pruned_loss=0.01175, audio_tagging_loss=0.008632, over 3039436.59 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:08:04,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2023-11-29 14:08:07,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-29 14:08:10,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3995873.3333333335, ans=0.0 2023-11-29 14:08:14,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3995873.3333333335, ans=0.0 2023-11-29 14:08:14,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-29 14:08:16,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3995873.3333333335, ans=0.1 2023-11-29 14:08:25,149 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:08:32,967 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599400 2023-11-29 14:08:41,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3996006.6666666665, ans=0.125 2023-11-29 14:08:47,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3996073.3333333335, ans=0.09899494936611666 2023-11-29 14:08:55,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3996073.3333333335, ans=0.1 2023-11-29 14:08:59,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 9.205e+01 9.737e+01 1.021e+02 2.393e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 14:09:01,482 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10250, loss[loss=0.06644, simple_loss=0.08504, pruned_loss=0.01387, audio_tagging_loss=0.01005, over 14963.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08795, pruned_loss=0.01164, audio_tagging_loss=0.008703, over 3039438.47 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:09:10,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3996140.0, ans=0.125 2023-11-29 14:09:27,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-29 14:09:33,762 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599450 2023-11-29 14:09:36,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-29 14:09:44,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-29 14:09:49,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3996340.0, ans=0.125 2023-11-29 14:09:59,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3996406.6666666665, ans=0.0 2023-11-29 14:10:02,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3996473.3333333335, ans=0.0 2023-11-29 14:10:03,462 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10300, loss[loss=0.05677, simple_loss=0.08082, pruned_loss=0.008386, audio_tagging_loss=0.007976, over 15579.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08828, pruned_loss=0.01158, audio_tagging_loss=0.008576, over 3045229.52 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:10:06,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3996473.3333333335, ans=0.125 2023-11-29 14:10:08,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2023-11-29 14:10:16,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3996540.0, ans=0.125 2023-11-29 14:10:37,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3996606.6666666665, ans=0.2 2023-11-29 14:10:38,086 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599500 2023-11-29 14:10:42,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3996673.3333333335, ans=0.125 2023-11-29 14:10:45,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3996673.3333333335, ans=0.125 2023-11-29 14:11:03,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 9.141e+01 9.728e+01 1.059e+02 1.776e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 14:11:06,030 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10350, loss[loss=0.07862, simple_loss=0.1048, pruned_loss=0.01574, audio_tagging_loss=0.01049, over 13929.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08793, pruned_loss=0.01156, audio_tagging_loss=0.008719, over 3042824.99 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:11:06,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-29 14:11:37,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3996940.0, ans=0.125 2023-11-29 14:11:40,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599550 2023-11-29 14:11:47,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3997006.6666666665, ans=0.0 2023-11-29 14:12:08,457 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10400, loss[loss=0.0806, simple_loss=0.1034, pruned_loss=0.02203, audio_tagging_loss=0.006883, over 14689.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08833, pruned_loss=0.01162, audio_tagging_loss=0.008791, over 3044295.38 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:12:08,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3997140.0, ans=0.125 2023-11-29 14:12:18,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3997140.0, ans=0.09899494936611666 2023-11-29 14:12:19,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3997206.6666666665, ans=0.0 2023-11-29 14:12:33,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3997273.3333333335, ans=0.125 2023-11-29 14:12:40,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-29 14:12:41,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3997273.3333333335, ans=0.0 2023-11-29 14:12:42,159 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599600 2023-11-29 14:13:00,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3997406.6666666665, ans=0.1 2023-11-29 14:13:08,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2023-11-29 14:13:08,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.101e+01 9.809e+01 1.030e+02 1.340e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:13:10,936 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10450, loss[loss=0.04883, simple_loss=0.06612, pruned_loss=0.006589, audio_tagging_loss=0.009187, over 15326.00 frames. ], tot_loss[loss=0.06366, simple_loss=0.08672, pruned_loss=0.01136, audio_tagging_loss=0.008943, over 3031838.81 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:13:17,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2023-11-29 14:13:29,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-29 14:13:30,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3997540.0, ans=0.0 2023-11-29 14:13:44,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599650 2023-11-29 14:13:53,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3997673.3333333335, ans=0.0 2023-11-29 14:13:57,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3997673.3333333335, ans=0.2 2023-11-29 14:13:57,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3997673.3333333335, ans=0.1 2023-11-29 14:13:58,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3997740.0, ans=0.125 2023-11-29 14:14:12,923 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10500, loss[loss=0.05851, simple_loss=0.08071, pruned_loss=0.008162, audio_tagging_loss=0.009995, over 15773.00 frames. ], tot_loss[loss=0.0636, simple_loss=0.08677, pruned_loss=0.01142, audio_tagging_loss=0.008792, over 3038394.58 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:14:45,934 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599700 2023-11-29 14:14:55,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3998006.6666666665, ans=0.125 2023-11-29 14:15:03,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3998073.3333333335, ans=0.0 2023-11-29 14:15:14,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.178e+01 9.890e+01 1.072e+02 1.434e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 14:15:15,336 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10550, loss[loss=0.04768, simple_loss=0.05558, pruned_loss=0.009475, audio_tagging_loss=0.01041, over 14280.00 frames. ], tot_loss[loss=0.06345, simple_loss=0.08678, pruned_loss=0.01147, audio_tagging_loss=0.008588, over 3037553.85 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:15:24,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2023-11-29 14:15:30,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3998206.6666666665, ans=0.125 2023-11-29 14:15:30,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2023-11-29 14:15:48,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599750 2023-11-29 14:15:56,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3998340.0, ans=0.125 2023-11-29 14:16:16,372 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10600, loss[loss=0.08593, simple_loss=0.119, pruned_loss=0.01874, audio_tagging_loss=0.007678, over 14699.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.0891, pruned_loss=0.01177, audio_tagging_loss=0.008394, over 3037929.25 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:16:16,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3998473.3333333335, ans=12.0 2023-11-29 14:16:36,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3998540.0, ans=22.5 2023-11-29 14:16:36,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2023-11-29 14:16:41,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3998606.6666666665, ans=0.125 2023-11-29 14:16:50,360 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599800 2023-11-29 14:16:53,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3998673.3333333335, ans=0.125 2023-11-29 14:17:02,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3998673.3333333335, ans=0.1 2023-11-29 14:17:07,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3998740.0, ans=0.1 2023-11-29 14:17:17,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.773e+01 1.042e+02 1.293e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 14:17:19,084 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10650, loss[loss=0.07508, simple_loss=0.1094, pruned_loss=0.01585, audio_tagging_loss=0.004536, over 15527.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08947, pruned_loss=0.01183, audio_tagging_loss=0.008298, over 3039653.10 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:17:33,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3998873.3333333335, ans=0.0 2023-11-29 14:17:37,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3998873.3333333335, ans=0.1 2023-11-29 14:17:51,897 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599850 2023-11-29 14:18:11,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-29 14:18:20,951 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10700, loss[loss=0.06005, simple_loss=0.08141, pruned_loss=0.007365, audio_tagging_loss=0.01198, over 14355.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08959, pruned_loss=0.0119, audio_tagging_loss=0.008287, over 3037891.51 frames. ], batch size: 52, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:18:25,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3999140.0, ans=0.125 2023-11-29 14:18:36,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3999206.6666666665, ans=0.2 2023-11-29 14:18:46,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-29 14:18:54,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599900 2023-11-29 14:19:02,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=15.0 2023-11-29 14:19:16,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3999406.6666666665, ans=0.125 2023-11-29 14:19:21,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.059e+01 9.669e+01 1.032e+02 1.449e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 14:19:22,375 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10750, loss[loss=0.07619, simple_loss=0.1019, pruned_loss=0.0174, audio_tagging_loss=0.007827, over 15151.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08901, pruned_loss=0.01194, audio_tagging_loss=0.00829, over 3037614.77 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:19:26,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3999473.3333333335, ans=0.1 2023-11-29 14:19:47,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3999606.6666666665, ans=0.1 2023-11-29 14:19:56,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 599950 2023-11-29 14:19:58,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-29 14:20:01,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3999673.3333333335, ans=0.2 2023-11-29 14:20:05,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3999673.3333333335, ans=6.0 2023-11-29 14:20:20,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3999740.0, ans=0.125 2023-11-29 14:20:23,911 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10800, loss[loss=0.06098, simple_loss=0.08252, pruned_loss=0.009907, audio_tagging_loss=0.009817, over 15245.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08872, pruned_loss=0.01171, audio_tagging_loss=0.008266, over 3049669.43 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:20:43,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=22.5 2023-11-29 14:20:45,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3999873.3333333335, ans=15.0 2023-11-29 14:20:46,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3999873.3333333335, ans=0.125 2023-11-29 14:20:56,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3999940.0, ans=0.125 2023-11-29 14:20:57,275 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600000 2023-11-29 14:20:57,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3999940.0, ans=0.125 2023-11-29 14:21:09,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4000006.6666666665, ans=0.0 2023-11-29 14:21:18,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4000073.3333333335, ans=0.0 2023-11-29 14:21:26,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4000073.3333333335, ans=15.0 2023-11-29 14:21:28,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.037e+01 9.839e+01 1.048e+02 1.354e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 14:21:28,932 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10850, loss[loss=0.06165, simple_loss=0.08167, pruned_loss=0.01079, audio_tagging_loss=0.01002, over 15120.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08859, pruned_loss=0.01172, audio_tagging_loss=0.008429, over 3046140.51 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:21:38,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2023-11-29 14:22:02,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600050 2023-11-29 14:22:11,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4000340.0, ans=0.125 2023-11-29 14:22:26,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4000406.6666666665, ans=0.125 2023-11-29 14:22:30,823 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10900, loss[loss=0.06469, simple_loss=0.09325, pruned_loss=0.008969, audio_tagging_loss=0.009093, over 14138.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.0881, pruned_loss=0.01179, audio_tagging_loss=0.008531, over 3040390.85 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:22:30,862 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:22:50,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4000540.0, ans=0.125 2023-11-29 14:22:54,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4000606.6666666665, ans=0.125 2023-11-29 14:22:57,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4000606.6666666665, ans=0.125 2023-11-29 14:23:03,912 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600100 2023-11-29 14:23:06,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-29 14:23:19,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.30 vs. limit=5.0 2023-11-29 14:23:23,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-29 14:23:32,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.967e+01 9.295e+01 9.770e+01 1.037e+02 1.464e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 14:23:32,268 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 10950, loss[loss=0.04463, simple_loss=0.05326, pruned_loss=0.009794, audio_tagging_loss=0.008205, over 14820.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08853, pruned_loss=0.01192, audio_tagging_loss=0.008582, over 3044063.53 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:23:34,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4000806.6666666665, ans=0.2 2023-11-29 14:23:42,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4000806.6666666665, ans=0.125 2023-11-29 14:24:04,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-29 14:24:05,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600150 2023-11-29 14:24:08,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4001006.6666666665, ans=0.125 2023-11-29 14:24:16,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4001006.6666666665, ans=0.2 2023-11-29 14:24:31,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4001073.3333333335, ans=0.125 2023-11-29 14:24:34,448 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11000, loss[loss=0.07986, simple_loss=0.1202, pruned_loss=0.01335, audio_tagging_loss=0.006393, over 15687.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08903, pruned_loss=0.01182, audio_tagging_loss=0.008496, over 3045463.66 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:24:41,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4001140.0, ans=0.125 2023-11-29 14:24:48,123 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:24:56,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4001206.6666666665, ans=0.0 2023-11-29 14:24:59,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001273.3333333335, ans=0.1 2023-11-29 14:25:05,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001273.3333333335, ans=0.1 2023-11-29 14:25:07,589 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600200 2023-11-29 14:25:36,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.890e+01 9.536e+01 1.018e+02 1.298e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 14:25:36,614 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11050, loss[loss=0.06506, simple_loss=0.09155, pruned_loss=0.0116, audio_tagging_loss=0.007685, over 15531.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08875, pruned_loss=0.01179, audio_tagging_loss=0.008594, over 3047038.56 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:25:37,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2023-11-29 14:25:42,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4001473.3333333335, ans=0.2 2023-11-29 14:26:09,882 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600250 2023-11-29 14:26:10,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4001606.6666666665, ans=0.125 2023-11-29 14:26:14,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4001673.3333333335, ans=0.07 2023-11-29 14:26:14,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4001673.3333333335, ans=0.2 2023-11-29 14:26:19,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-29 14:26:26,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4001740.0, ans=0.0 2023-11-29 14:26:32,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4001740.0, ans=0.0 2023-11-29 14:26:32,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-29 14:26:33,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4001740.0, ans=0.125 2023-11-29 14:26:34,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4001740.0, ans=0.125 2023-11-29 14:26:38,312 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11100, loss[loss=0.07298, simple_loss=0.1062, pruned_loss=0.01313, audio_tagging_loss=0.006729, over 15640.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08901, pruned_loss=0.01176, audio_tagging_loss=0.008701, over 3048151.84 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:26:40,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4001806.6666666665, ans=0.0 2023-11-29 14:26:44,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001806.6666666665, ans=0.1 2023-11-29 14:26:55,422 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:27:11,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600300 2023-11-29 14:27:40,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.265e+01 9.785e+01 1.064e+02 1.288e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 14:27:40,231 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11150, loss[loss=0.05043, simple_loss=0.06495, pruned_loss=0.007623, audio_tagging_loss=0.01033, over 13519.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08863, pruned_loss=0.01162, audio_tagging_loss=0.008777, over 3048475.64 frames. ], batch size: 52, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:27:53,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-11-29 14:28:04,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4002273.3333333335, ans=0.0 2023-11-29 14:28:07,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4002273.3333333335, ans=0.0 2023-11-29 14:28:09,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2023-11-29 14:28:13,820 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600350 2023-11-29 14:28:38,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4002406.6666666665, ans=0.0 2023-11-29 14:28:41,703 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11200, loss[loss=0.08189, simple_loss=0.1147, pruned_loss=0.01615, audio_tagging_loss=0.008368, over 17046.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08856, pruned_loss=0.01168, audio_tagging_loss=0.008751, over 3049415.83 frames. ], batch size: 64, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:14,755 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600400 2023-11-29 14:29:15,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2023-11-29 14:29:19,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4002673.3333333335, ans=0.125 2023-11-29 14:29:40,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4002740.0, ans=0.0 2023-11-29 14:29:43,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 9.140e+01 9.616e+01 1.036e+02 1.306e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 14:29:43,076 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11250, loss[loss=0.06206, simple_loss=0.08826, pruned_loss=0.01045, audio_tagging_loss=0.007483, over 16501.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08805, pruned_loss=0.01159, audio_tagging_loss=0.008764, over 3052074.54 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:48,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4002806.6666666665, ans=0.125 2023-11-29 14:29:55,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.93 vs. limit=6.0 2023-11-29 14:29:57,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4002873.3333333335, ans=0.2 2023-11-29 14:30:17,304 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600450 2023-11-29 14:30:44,765 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11300, loss[loss=0.06861, simple_loss=0.09502, pruned_loss=0.01341, audio_tagging_loss=0.007689, over 15782.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08841, pruned_loss=0.01178, audio_tagging_loss=0.008727, over 3048639.27 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:30:45,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4003140.0, ans=0.125 2023-11-29 14:30:50,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4003140.0, ans=0.125 2023-11-29 14:30:58,983 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:31:18,220 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600500 2023-11-29 14:31:44,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4003406.6666666665, ans=0.0 2023-11-29 14:31:46,829 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11350, loss[loss=0.0737, simple_loss=0.09297, pruned_loss=0.01941, audio_tagging_loss=0.007802, over 14098.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08872, pruned_loss=0.01179, audio_tagging_loss=0.008636, over 3043689.77 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:31:49,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.251e+01 9.881e+01 1.050e+02 2.034e+02, threshold=1.976e+02, percent-clipped=1.0 2023-11-29 14:32:19,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600550 2023-11-29 14:32:39,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4003740.0, ans=0.125 2023-11-29 14:32:48,388 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11400, loss[loss=0.05493, simple_loss=0.07454, pruned_loss=0.009607, audio_tagging_loss=0.008056, over 16105.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08914, pruned_loss=0.01185, audio_tagging_loss=0.008465, over 3044124.67 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:32:52,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4003806.6666666665, ans=0.0 2023-11-29 14:33:02,124 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:33:06,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4003873.3333333335, ans=0.0 2023-11-29 14:33:07,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2023-11-29 14:33:22,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600600 2023-11-29 14:33:39,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4004073.3333333335, ans=0.0 2023-11-29 14:33:45,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4004073.3333333335, ans=0.125 2023-11-29 14:33:49,935 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11450, loss[loss=0.06925, simple_loss=0.0945, pruned_loss=0.01165, audio_tagging_loss=0.01035, over 15546.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08901, pruned_loss=0.0119, audio_tagging_loss=0.008434, over 3039038.73 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:33:52,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.290e+01 9.810e+01 1.057e+02 1.472e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:33:57,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2023-11-29 14:34:21,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-29 14:34:24,126 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600650 2023-11-29 14:34:25,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-29 14:34:53,813 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11500, loss[loss=0.05829, simple_loss=0.07752, pruned_loss=0.01173, audio_tagging_loss=0.007805, over 14637.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08853, pruned_loss=0.01181, audio_tagging_loss=0.008504, over 3042984.68 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:35:06,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4004540.0, ans=0.125 2023-11-29 14:35:11,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4004540.0, ans=0.0 2023-11-29 14:35:20,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2023-11-29 14:35:26,647 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600700 2023-11-29 14:35:28,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4004606.6666666665, ans=0.125 2023-11-29 14:35:31,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-29 14:35:32,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4004673.3333333335, ans=0.2 2023-11-29 14:35:47,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4004740.0, ans=0.125 2023-11-29 14:35:50,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4004740.0, ans=0.0 2023-11-29 14:35:55,394 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11550, loss[loss=0.0765, simple_loss=0.1065, pruned_loss=0.01597, audio_tagging_loss=0.00729, over 15689.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08902, pruned_loss=0.01187, audio_tagging_loss=0.008516, over 3045521.02 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:35:57,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.007e+01 9.636e+01 1.040e+02 1.609e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 14:36:02,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2023-11-29 14:36:04,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4004806.6666666665, ans=0.0 2023-11-29 14:36:13,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4004873.3333333335, ans=0.125 2023-11-29 14:36:20,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4004940.0, ans=0.125 2023-11-29 14:36:28,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600750 2023-11-29 14:36:33,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4005006.6666666665, ans=0.0 2023-11-29 14:36:34,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4005006.6666666665, ans=0.2 2023-11-29 14:36:36,873 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:36:37,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4005006.6666666665, ans=0.1 2023-11-29 14:36:46,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-29 14:36:53,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4005073.3333333335, ans=0.1 2023-11-29 14:36:56,752 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11600, loss[loss=0.06077, simple_loss=0.0862, pruned_loss=0.009525, audio_tagging_loss=0.008146, over 15659.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08878, pruned_loss=0.01172, audio_tagging_loss=0.008526, over 3044244.71 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:37:10,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4005206.6666666665, ans=0.125 2023-11-29 14:37:10,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4005206.6666666665, ans=0.125 2023-11-29 14:37:27,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4005273.3333333335, ans=0.0 2023-11-29 14:37:28,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4005273.3333333335, ans=0.125 2023-11-29 14:37:30,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600800 2023-11-29 14:37:30,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4005273.3333333335, ans=0.1 2023-11-29 14:37:46,750 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:37:58,588 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11650, loss[loss=0.06284, simple_loss=0.07983, pruned_loss=0.01124, audio_tagging_loss=0.01168, over 15252.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08916, pruned_loss=0.01188, audio_tagging_loss=0.008619, over 3044144.30 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:38:00,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.952e+01 9.263e+01 9.866e+01 1.051e+02 2.462e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-29 14:38:11,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-29 14:38:15,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4005540.0, ans=0.2 2023-11-29 14:38:15,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4005540.0, ans=0.0 2023-11-29 14:38:15,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4005540.0, ans=0.125 2023-11-29 14:38:19,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4005540.0, ans=0.0 2023-11-29 14:38:25,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4005606.6666666665, ans=0.125 2023-11-29 14:38:32,162 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600850 2023-11-29 14:38:39,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4005673.3333333335, ans=0.2 2023-11-29 14:38:41,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4005673.3333333335, ans=0.125 2023-11-29 14:38:44,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4005673.3333333335, ans=0.125 2023-11-29 14:38:59,923 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11700, loss[loss=0.05886, simple_loss=0.0809, pruned_loss=0.0119, audio_tagging_loss=0.006508, over 13807.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08888, pruned_loss=0.01191, audio_tagging_loss=0.008565, over 3034933.35 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:39:09,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4005806.6666666665, ans=0.0 2023-11-29 14:39:20,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4005873.3333333335, ans=0.0 2023-11-29 14:39:33,578 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600900 2023-11-29 14:39:35,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4005940.0, ans=0.2 2023-11-29 14:39:41,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4006006.6666666665, ans=0.0 2023-11-29 14:39:58,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4006073.3333333335, ans=0.1 2023-11-29 14:40:00,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4006073.3333333335, ans=0.0 2023-11-29 14:40:02,086 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11750, loss[loss=0.06638, simple_loss=0.08945, pruned_loss=0.01189, audio_tagging_loss=0.009763, over 15361.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08975, pruned_loss=0.01205, audio_tagging_loss=0.008481, over 3032113.23 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:40:05,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.067e+01 9.619e+01 1.045e+02 1.766e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 14:40:07,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2023-11-29 14:40:23,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4006206.6666666665, ans=0.125 2023-11-29 14:40:25,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4006273.3333333335, ans=0.1 2023-11-29 14:40:31,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4006273.3333333335, ans=0.07 2023-11-29 14:40:33,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4006273.3333333335, ans=0.0 2023-11-29 14:40:34,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 600950 2023-11-29 14:40:52,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4006406.6666666665, ans=0.125 2023-11-29 14:41:02,834 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11800, loss[loss=0.06723, simple_loss=0.1001, pruned_loss=0.009832, audio_tagging_loss=0.007357, over 15746.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08898, pruned_loss=0.01189, audio_tagging_loss=0.008487, over 3033801.36 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:41:18,702 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:41:35,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601000 2023-11-29 14:41:36,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4006606.6666666665, ans=0.0 2023-11-29 14:42:02,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4006740.0, ans=0.0 2023-11-29 14:42:04,179 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11850, loss[loss=0.05988, simple_loss=0.08797, pruned_loss=0.007569, audio_tagging_loss=0.008321, over 15584.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08919, pruned_loss=0.01187, audio_tagging_loss=0.008554, over 3043001.00 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:42:05,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4006806.6666666665, ans=0.125 2023-11-29 14:42:07,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.068e+01 9.599e+01 1.030e+02 1.301e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 14:42:12,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4006806.6666666665, ans=0.1 2023-11-29 14:42:26,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2023-11-29 14:42:34,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4006940.0, ans=0.125 2023-11-29 14:42:37,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601050 2023-11-29 14:42:48,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4007006.6666666665, ans=0.125 2023-11-29 14:42:49,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.57 vs. limit=22.5 2023-11-29 14:42:52,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4007073.3333333335, ans=0.125 2023-11-29 14:43:03,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4007073.3333333335, ans=0.07 2023-11-29 14:43:05,831 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11900, loss[loss=0.04934, simple_loss=0.06425, pruned_loss=0.00784, audio_tagging_loss=0.00938, over 13717.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08878, pruned_loss=0.01187, audio_tagging_loss=0.008602, over 3045200.57 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:43:15,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=4007140.0, ans=15.0 2023-11-29 14:43:34,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007273.3333333335, ans=0.1 2023-11-29 14:43:39,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601100 2023-11-29 14:43:39,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4007273.3333333335, ans=0.0 2023-11-29 14:43:40,562 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:43:45,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4007340.0, ans=0.125 2023-11-29 14:43:51,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4007340.0, ans=0.125 2023-11-29 14:43:51,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4007340.0, ans=0.125 2023-11-29 14:44:01,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4007406.6666666665, ans=0.07 2023-11-29 14:44:07,726 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 11950, loss[loss=0.07482, simple_loss=0.1044, pruned_loss=0.01464, audio_tagging_loss=0.007955, over 15587.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08846, pruned_loss=0.01186, audio_tagging_loss=0.008707, over 3045671.87 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:44:11,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.998e+01 9.725e+01 1.040e+02 1.716e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 14:44:15,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4007473.3333333335, ans=0.125 2023-11-29 14:44:40,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601150 2023-11-29 14:44:44,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4007673.3333333335, ans=0.125 2023-11-29 14:44:51,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4007673.3333333335, ans=0.125 2023-11-29 14:45:05,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4007740.0, ans=0.125 2023-11-29 14:45:07,576 INFO [train_asr.py:1235] (3/4) Epoch 50, batch 12000, loss[loss=0.06883, simple_loss=0.09894, pruned_loss=0.01252, audio_tagging_loss=0.006834, over 15626.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08796, pruned_loss=0.01174, audio_tagging_loss=0.008828, over 3048071.94 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:45:07,577 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 14:45:29,650 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.5083, 6.3863, 6.1349, 6.1572], device='cuda:3') 2023-11-29 14:45:37,070 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4438, 3.7253, 3.0587, 3.7965], device='cuda:3') 2023-11-29 14:45:47,780 INFO [train_asr.py:1267] (3/4) Epoch 50, validation: loss=0.05813, simple_loss=0.05044, pruned_loss=0.005399, audio_tagging_loss=0.02752, over 4681554.00 frames. 2023-11-29 14:45:47,780 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 14:45:53,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4007806.6666666665, ans=0.2 2023-11-29 14:45:57,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-29 14:46:10,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=12.0 2023-11-29 14:46:11,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4007940.0, ans=0.125 2023-11-29 14:46:34,490 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 0, loss[loss=0.06293, simple_loss=0.07137, pruned_loss=0.007969, audio_tagging_loss=0.01927, over 14003.00 frames. ], tot_loss[loss=0.06293, simple_loss=0.07137, pruned_loss=0.007969, audio_tagging_loss=0.01927, over 14003.00 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:46:34,491 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-29 14:47:11,086 INFO [train_asr.py:1267] (3/4) Epoch 51, validation: loss=0.05803, simple_loss=0.05046, pruned_loss=0.005398, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-29 14:47:11,087 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-29 14:47:13,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601200 2023-11-29 14:47:40,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4008106.6666666665, ans=0.0 2023-11-29 14:47:45,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.507e+01 9.981e+01 1.081e+02 1.521e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 14:47:56,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2023-11-29 14:47:58,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4008173.3333333335, ans=0.125 2023-11-29 14:48:13,519 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 50, loss[loss=0.09402, simple_loss=0.127, pruned_loss=0.01792, audio_tagging_loss=0.01258, over 16164.00 frames. ], tot_loss[loss=0.07239, simple_loss=0.08841, pruned_loss=0.01156, audio_tagging_loss=0.01662, over 692367.56 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:48:15,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601250 2023-11-29 14:48:17,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-29 14:48:46,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2023-11-29 14:48:46,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-29 14:49:07,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4008573.3333333335, ans=0.0 2023-11-29 14:49:15,448 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 100, loss[loss=0.05575, simple_loss=0.07095, pruned_loss=0.006927, audio_tagging_loss=0.01335, over 15010.00 frames. ], tot_loss[loss=0.07298, simple_loss=0.09025, pruned_loss=0.01211, audio_tagging_loss=0.01575, over 1215386.64 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:49:16,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4008640.0, ans=0.125 2023-11-29 14:49:17,821 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601300 2023-11-29 14:49:33,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=22.5 2023-11-29 14:49:51,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.732e+01 9.896e+01 1.042e+02 1.115e+02 1.364e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-29 14:49:59,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4008840.0, ans=10.0 2023-11-29 14:50:00,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4008840.0, ans=0.0 2023-11-29 14:50:11,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4008906.6666666665, ans=0.125 2023-11-29 14:50:17,033 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 150, loss[loss=0.06819, simple_loss=0.0907, pruned_loss=0.01293, audio_tagging_loss=0.009909, over 15360.00 frames. ], tot_loss[loss=0.07022, simple_loss=0.08948, pruned_loss=0.01154, audio_tagging_loss=0.01394, over 1619284.71 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:50:19,569 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601350 2023-11-29 14:50:50,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4009106.6666666665, ans=0.0 2023-11-29 14:50:54,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4009173.3333333335, ans=0.125 2023-11-29 14:51:07,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4009240.0, ans=0.125 2023-11-29 14:51:11,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4009240.0, ans=0.125 2023-11-29 14:51:19,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-11-29 14:51:19,921 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 200, loss[loss=0.06332, simple_loss=0.09633, pruned_loss=0.007728, audio_tagging_loss=0.007429, over 16273.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09034, pruned_loss=0.01153, audio_tagging_loss=0.01233, over 1932906.88 frames. ], batch size: 60, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:51:22,357 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601400 2023-11-29 14:51:27,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4009306.6666666665, ans=0.0 2023-11-29 14:51:55,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.139e+01 9.906e+01 1.061e+02 1.460e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 14:52:02,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4009506.6666666665, ans=0.1 2023-11-29 14:52:21,834 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 250, loss[loss=0.07191, simple_loss=0.09884, pruned_loss=0.01368, audio_tagging_loss=0.008815, over 15456.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.08989, pruned_loss=0.01163, audio_tagging_loss=0.01127, over 2182756.04 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:52:24,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601450 2023-11-29 14:52:45,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-29 14:52:48,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4009773.3333333335, ans=0.025 2023-11-29 14:52:52,007 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:53:08,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4009840.0, ans=0.125 2023-11-29 14:53:23,646 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 300, loss[loss=0.04886, simple_loss=0.06444, pruned_loss=0.007648, audio_tagging_loss=0.008995, over 14204.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09115, pruned_loss=0.01189, audio_tagging_loss=0.01029, over 2370476.30 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:53:26,667 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601500 2023-11-29 14:53:56,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4010106.6666666665, ans=0.125 2023-11-29 14:53:56,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-29 14:53:59,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.446e+01 1.010e+02 1.084e+02 1.415e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-29 14:54:26,183 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 350, loss[loss=0.05056, simple_loss=0.06275, pruned_loss=0.01073, audio_tagging_loss=0.008451, over 14687.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09034, pruned_loss=0.01184, audio_tagging_loss=0.009763, over 2519461.86 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:54:29,224 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601550 2023-11-29 14:54:43,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4010373.3333333335, ans=0.0 2023-11-29 14:54:49,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4010440.0, ans=0.125 2023-11-29 14:54:49,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4010440.0, ans=0.1 2023-11-29 14:54:55,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4010440.0, ans=0.125 2023-11-29 14:55:01,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4010440.0, ans=0.0 2023-11-29 14:55:05,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4010506.6666666665, ans=0.125 2023-11-29 14:55:25,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4010573.3333333335, ans=0.125 2023-11-29 14:55:27,917 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 400, loss[loss=0.05623, simple_loss=0.08004, pruned_loss=0.007606, audio_tagging_loss=0.008607, over 15166.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08985, pruned_loss=0.01175, audio_tagging_loss=0.009386, over 2636783.35 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:55:30,261 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601600 2023-11-29 14:55:34,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4010640.0, ans=0.125 2023-11-29 14:55:53,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4010773.3333333335, ans=0.125 2023-11-29 14:56:00,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4010773.3333333335, ans=0.125 2023-11-29 14:56:04,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.094e+01 9.565e+01 1.047e+02 1.359e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 14:56:22,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4010906.6666666665, ans=0.125 2023-11-29 14:56:22,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4010906.6666666665, ans=0.0 2023-11-29 14:56:22,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4010906.6666666665, ans=0.0 2023-11-29 14:56:29,839 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 450, loss[loss=0.05847, simple_loss=0.08809, pruned_loss=0.007564, audio_tagging_loss=0.006863, over 15840.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.0893, pruned_loss=0.0118, audio_tagging_loss=0.009165, over 2734879.80 frames. ], batch size: 60, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:56:32,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601650 2023-11-29 14:56:35,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4010973.3333333335, ans=0.1 2023-11-29 14:56:35,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=12.0 2023-11-29 14:56:47,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4011040.0, ans=0.0 2023-11-29 14:57:09,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4011173.3333333335, ans=0.125 2023-11-29 14:57:16,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4011173.3333333335, ans=0.1 2023-11-29 14:57:23,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4011240.0, ans=0.1 2023-11-29 14:57:24,455 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:57:31,212 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 500, loss[loss=0.08489, simple_loss=0.1151, pruned_loss=0.01944, audio_tagging_loss=0.007897, over 15854.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08972, pruned_loss=0.01169, audio_tagging_loss=0.008959, over 2809075.69 frames. ], batch size: 59, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:57:31,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4011306.6666666665, ans=0.2 2023-11-29 14:57:33,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601700 2023-11-29 14:57:57,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-11-29 14:57:58,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2023-11-29 14:57:59,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.26 vs. limit=10.0 2023-11-29 14:58:01,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4011440.0, ans=0.0 2023-11-29 14:58:07,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.018e+01 9.718e+01 1.038e+02 1.323e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 14:58:11,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4011506.6666666665, ans=0.04949747468305833 2023-11-29 14:58:11,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-29 14:58:21,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4011573.3333333335, ans=0.0 2023-11-29 14:58:26,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4011573.3333333335, ans=0.0 2023-11-29 14:58:32,329 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 550, loss[loss=0.07537, simple_loss=0.1136, pruned_loss=0.01279, audio_tagging_loss=0.005771, over 15865.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08931, pruned_loss=0.0116, audio_tagging_loss=0.00888, over 2867056.33 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:58:34,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601750 2023-11-29 14:58:45,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4011706.6666666665, ans=0.125 2023-11-29 14:59:03,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4011773.3333333335, ans=0.1 2023-11-29 14:59:04,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4011773.3333333335, ans=0.125 2023-11-29 14:59:15,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4011840.0, ans=0.07 2023-11-29 14:59:35,613 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 600, loss[loss=0.06355, simple_loss=0.08588, pruned_loss=0.0125, audio_tagging_loss=0.008116, over 15031.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.089, pruned_loss=0.0117, audio_tagging_loss=0.008808, over 2904154.37 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:59:38,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601800 2023-11-29 14:59:43,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4011973.3333333335, ans=0.0 2023-11-29 14:59:52,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4012040.0, ans=0.125 2023-11-29 15:00:12,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.089e+01 9.742e+01 1.033e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 15:00:21,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-29 15:00:30,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4012240.0, ans=0.125 2023-11-29 15:00:32,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4012240.0, ans=0.0 2023-11-29 15:00:38,267 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 650, loss[loss=0.06053, simple_loss=0.08001, pruned_loss=0.009273, audio_tagging_loss=0.01126, over 14999.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.0896, pruned_loss=0.01165, audio_tagging_loss=0.008707, over 2934408.12 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:00:40,745 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601850 2023-11-29 15:00:42,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4012306.6666666665, ans=0.125 2023-11-29 15:00:53,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4012373.3333333335, ans=0.125 2023-11-29 15:00:56,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2023-11-29 15:01:03,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-29 15:01:06,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4012440.0, ans=0.0 2023-11-29 15:01:39,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-29 15:01:39,799 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 700, loss[loss=0.0826, simple_loss=0.1124, pruned_loss=0.01607, audio_tagging_loss=0.01035, over 15687.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0897, pruned_loss=0.01186, audio_tagging_loss=0.008669, over 2960984.16 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:01:42,851 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601900 2023-11-29 15:01:43,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=22.5 2023-11-29 15:01:57,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4012706.6666666665, ans=0.0 2023-11-29 15:01:57,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-11-29 15:02:11,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4012773.3333333335, ans=0.125 2023-11-29 15:02:15,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 8.992e+01 9.586e+01 1.033e+02 1.329e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 15:02:41,756 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 750, loss[loss=0.05396, simple_loss=0.07797, pruned_loss=0.006301, audio_tagging_loss=0.008667, over 14815.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09079, pruned_loss=0.01207, audio_tagging_loss=0.008561, over 2971604.18 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:02:44,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 601950 2023-11-29 15:02:48,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4012973.3333333335, ans=0.04949747468305833 2023-11-29 15:03:00,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4013040.0, ans=0.125 2023-11-29 15:03:10,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-29 15:03:15,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4013106.6666666665, ans=0.125 2023-11-29 15:03:41,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4013240.0, ans=0.1 2023-11-29 15:03:44,101 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 800, loss[loss=0.06884, simple_loss=0.09594, pruned_loss=0.01178, audio_tagging_loss=0.009096, over 15005.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09037, pruned_loss=0.012, audio_tagging_loss=0.008681, over 2982365.49 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:03:46,529 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 602000 2023-11-29 15:04:01,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4013373.3333333335, ans=0.1 2023-11-29 15:04:21,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.185e+01 9.281e+01 9.942e+01 1.059e+02 1.465e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 15:04:32,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4013573.3333333335, ans=0.0 2023-11-29 15:04:46,292 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 850, loss[loss=0.05423, simple_loss=0.0753, pruned_loss=0.007905, audio_tagging_loss=0.008671, over 15511.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08982, pruned_loss=0.01184, audio_tagging_loss=0.008807, over 2999205.06 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:04:48,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 602050 2023-11-29 15:05:03,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4013706.6666666665, ans=0.0 2023-11-29 15:05:12,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-29 15:05:24,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4013840.0, ans=0.125 2023-11-29 15:05:28,513 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:05:48,317 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 900, loss[loss=0.05806, simple_loss=0.08063, pruned_loss=0.009863, audio_tagging_loss=0.007886, over 15926.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08947, pruned_loss=0.01195, audio_tagging_loss=0.008886, over 3006682.03 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:05:50,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 602100 2023-11-29 15:05:55,023 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:06:26,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.010e+01 9.603e+01 1.027e+02 1.199e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 15:06:43,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2023-11-29 15:06:51,392 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 950, loss[loss=0.06699, simple_loss=0.09723, pruned_loss=0.01136, audio_tagging_loss=0.007018, over 16727.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08915, pruned_loss=0.01194, audio_tagging_loss=0.008779, over 3019613.79 frames. ], batch size: 61, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:06:53,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 602150 2023-11-29 15:06:56,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4014306.6666666665, ans=0.2 2023-11-29 15:06:56,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4014306.6666666665, ans=0.025 2023-11-29 15:07:01,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4014306.6666666665, ans=0.125 2023-11-29 15:07:04,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4014373.3333333335, ans=0.0 2023-11-29 15:07:22,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4014440.0, ans=0.0 2023-11-29 15:07:35,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4014506.6666666665, ans=0.2 2023-11-29 15:07:45,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4014573.3333333335, ans=0.0 2023-11-29 15:07:49,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-29 15:07:53,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4014640.0, ans=0.125 2023-11-29 15:07:53,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4014640.0, ans=0.0 2023-11-29 15:07:53,943 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 1000, loss[loss=0.06523, simple_loss=0.08228, pruned_loss=0.01544, audio_tagging_loss=0.00865, over 13739.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08957, pruned_loss=0.01214, audio_tagging_loss=0.008598, over 3019120.92 frames. ], batch size: 53, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:07:56,544 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 602200 2023-11-29 15:08:00,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-11-29 15:08:01,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4014640.0, ans=0.125 2023-11-29 15:08:02,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4014640.0, ans=0.025 2023-11-29 15:08:24,038 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 15:08:28,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4014773.3333333335, ans=0.125 2023-11-29 15:08:31,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4014840.0, ans=0.125 2023-11-29 15:08:33,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 9.222e+01 9.984e+01 1.098e+02 2.505e+02, threshold=1.997e+02, percent-clipped=1.0 2023-11-29 15:08:40,392 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:08:51,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4014906.6666666665, ans=0.1 2023-11-29 15:08:54,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4014906.6666666665, ans=0.2 2023-11-29 15:08:56,577 INFO [train_asr.py:1235] (3/4) Epoch 51, batch 1050, loss[loss=0.05634, simple_loss=0.07997, pruned_loss=0.007754, audio_tagging_loss=0.008598, over 15588.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08908, pruned_loss=0.01178, audio_tagging_loss=0.008486, over 3021074.86 frames. ], batch size: 61, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:08:59,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 602250 2023-11-29 15:08:59,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4014973.3333333335, ans=10.0 2023-11-29 15:09:28,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-11-29 15:09:30,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-29 15:09:31,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4015106.6666666665, ans=0.125 2023-11-29 15:09:42,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4015173.3333333335, ans=10.0 2023-11-29 15:09:50,706 INFO [checkpoint.py:75] (3/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/bad-model-3.pt